Search code examples
rcataloglidarlidr

How to process a LAScatalog in parallel with lidR in R


I used to process a LIDAR catalog with the following code (using the LAScatalog processing engine from the great lidR package):

library(lidR)

lasdir <- "D:\\LAS\\"
output <- "D:\\LAS\\PRODUCTS\\"
epsg = "+init=epsg:25829"
res = 1

no_cores <- detectCores()
cat <- lascatalog(lasdir = lasdir, 
                  outputdir = output, 
                  pattern = '*COL.laz$|*COL.LAZ$',
                  catname = "Catalog",
                  clipcat = FALSE, clipcatbuf = FALSE, clipbuf = 1000, clipcatshape = clipcatshape,
                  cat_chunk_buffer = 20,
                  cores = no_cores, progress = TRUE,
                  laz_compression = TRUE, epsg = epsg,
                  retilecatalog = FALSE, tile_chunk_buffer = 10,
                  tile_chunk_size = 1000,
                  filterask = FALSE,
                  filter = "-keep_first -drop_z_below 2")

DEM_output <- paste0(output,"DEM_", str_pad(res, 3, "left", pad = "0"), "/")
opt_output_files(cat) <- paste0(DEM_output,"{ORIGINALFILENAME}") #set filepaths
DEM <- grid_terrain(cat, res = res, algorithm = "knnidw"(k = 5, p = 2)) 

There was some actualization of the library and now, the parameters cores seems not working and although the process works, now it does not work in parallel. A message states that: Option no longer supported. See ?lidR-parallelism.

enter image description here

How can I process a catalog in parallel now?


Solution

  • Since lidR 2.1.0 (July 2019) the opt_core() function has been deprecated. See the changelog.

    The strategy used to process the tiles in parallel must now be explicitly declared by users. This is anyway how it should have been designed from the beginning! For users, restoring the exact former behavior implies only one change.

    In versions < 2.1.0 the following was correct:

    library(lidR)
    ctg <- catalog("folder/")
    opt_cores(ctg) <- 4L
    hmean <- grid_metrics(ctg, mean(Z))
    

    In versions >= 2.1.0 this must be explicitly declared with the future package:

    library(lidR)
    library(future)
    plan(multisession)
    ctg <- catalog("folder/")
    hmean <- grid_metrics(ctg, mean(Z))
    

    Also this is fully documented in the manual page named lidR-parallelism.

    ?lidR::`lidR-parallelism`
    

    chunk-based parallelism

    When processing a LAScatalog, the internal engine splits the dataset into chunks and each chunk is read and processed sequentially in a loop. But actually this loop can be parallelized with the future package. By defaut the chunks are processed sequentially, but they can be processed in parallel by registering an evaluation strategy. For example, the following code is evaluated sequentially:

    ctg <- readLAScatalog("folder/")
    out <- grid_metrics(ctg, mean(Z))
    

    But this one is evaluated in parallel with two cores:

    library(future)
    plan(multisession, workers = 2L)
    ctg <- readLAScatalog("folder/")
    out <- grid_metrics(ctg, mean(Z))
    

    With chunk-based parallelism any algorithm can be parallelized by processing several subsets of a dataset [...]

    To fully take advantage to this new syntax you need to learn how future works. See future.