Search code examples
rparallel-processingjobs

How can I access a job::job() output in R as a variable rather than an environment?


I am processing large amounts of lidar data (>4TB worth) and I am running a height normalisation function on a lascatalog using the LidR package. I have used job::job() to run this as a background job.

My code is as follows (please note that this is not a lidR question):

las_dir <- "path/to/.las/files/"

las_cat <- readLAScatalog(las_dir, filter = "-drop_overlap -drop_class 6 7 9 13 14 15 16 17 18 0 -keep_random_fraction 0.1")

# Creating height normalise function using k-nearest neighbour inverse distance weighting

norm_height_knnidw <- function(chunk) {
  las <- readLAS(chunk)
  if (lidR::is.empty(las)) return(NULL)
  
  las <- normalize_height(las, algorithm = knnidw())
}

# Defining las catalog parameters for height normalisation

opt_chunk_size(las_cat) <- 250
opt_chunk_buffer(las_cat) <- 5
opt_output_files(las_cat) <- paste0(tempdir(), "{XLEFT}_normed", overwrite = TRUE)
opt_stop_early(las_cat) <- FALSE
opt <- list(automerge = TRUE)

job::job(las_normed = {
  options(mc.cores = 16) #on a high powered computer with 24 cores
# Running height normalisation function on las catalog
catalog_apply(las_cat, norm_height_knnidw, .options = opt)
})

This ran fine (took a total of 5 days and 9 hours) but the output is an environment which, when I try the next step in my workflow, won't interact with it:

> class(las_normed)
[1] "environment"

> str(las_normed)
<environment: 0x00000293a7b4c538>

# starting to define chunk options for next lascatalog process
> opt_output_files(las_normed) <- paste0(las_output, "{*}_dtm_pitfree", overwrite = TRUE)
Error in ctg@chunk_options : 
  no applicable method for `@` applied to an object of class "environment"

Is there a way I can convert this to a useable variable? Or is it a step I have missed in the initial setting up of the job? I have tried reading through job documentation here but I am struggling to understand quite where I have gone wrong/how to interact with the output. I have also gone through Chapter 7: Environments in Advanced R by Hadley Wickham but I am still struggling to understand how to use the environment output (I fully acknowledge that limited understanding of the environment object is likely the cause of this so any direction to more advice on them is very welcome).


Solution

  • You didn't make any assignments in the job::job call, so no variables were saved to the las_normed environment (other than .jobcode (and .Random.seed if catalog_apply used random numbers)). I'm guessing what you want is something like:

    job::job(las_normed = {
      options(mc.cores = 16) #on a high powered computer with 24 cores
    # Running height normalisation function on las catalog
      res = catalog_apply(las_cat, norm_height_knnidw, .options = opt)
    })
    

    When it completes, the las_normed environment will contain the object res.

    I'll risk stating the obvious and recommend you experiment with job::job using a less expensive operation until you are confident you know how it behaves before using it to perform a multi-day operation.