Search code examples
rparallel-processingaggregateraster

r: zApply in parallel computing


I need to aggregate a rasterbrick into monthly values. Normally, this would be easy by using zApply function from raster package. However, I have a large rasterbrick and this would take a very long time.

So basically, I am wondering if this would be easy to do it with some libraries like parallel or clusterR but I have no clue how to parallelize this process

# create a random raster stack

library(raster)

lay <- stack()

for (i in 1:365){
  print(i)
  ras <- matrix(rnorm(500, mean = 21, sd = rnorm(21, 12, 4)))
  ras <- raster(ras)
  lay <- addLayer(lay, ras)
}

dats <- seq(as.Date('2000-01-01'), length.out = nlayers(lay), by = 'days')

lay <- setZ(lay, dats)

monthlies <- zApply(lay, by = format(dats,"%m"), fun = 'mean') # aggregate from daily to monthly.

Thanks!


Solution

  • Use foraech and doParallel packages

    You can use foreach and doParallel to achieve your result. You will need to:

    • Detect the number of your CPU cores with detectCores()
    • Initialize DoParallel to work with your CPU cores with registerDoParallel(numCores)
    • Setup the foreach loop with the needed packages, any init variable, and a method to combine the results.

    Your code will look like this:

    library(foreach)
    library(doParallel)
    library(raster)
    
    lay <- stack()
    
    ## Loading required package: iterators
    
    numCores <- detectCores()
    registerDoParallel(numCores)  # use multicore, set to the number of our cores
    
    lay <- foreach (i=1:365, .init = lay, .combine = addLayer , .packages = "raster") %dopar% {
      print(i)
      ras <- matrix(rnorm(500, mean = 21, sd = rnorm(21, 12, 4)))
      ras <- raster(ras)
    }
    
    dats <- seq(as.Date('2000-01-01'), length.out = nlayers(lay), by = 'days')
    lay <- setZ(lay, dats)
    monthlies <- zApply(lay, by = format(dats,"%m"), fun = 'mean') # aggregate from daily to monthly
    
    # When you're done, clean up the cluster
    stopImplicitCluster()
    

    Measuring speed improvement

    You can test the speed improvement using System.time(). These are my results:

    #Time with a standard for loop
    system.time({
      for (i in 1:365){
        print(i)
        ras <- matrix(rnorm(500, mean = 21, sd = rnorm(21, 12, 4)))
        ras <- raster(ras)
        lay <- addLayer(lay, ras)
      }
    })
    
    user  system elapsed 
    66.29    0.09   67.15 
    
    #Testing foreach loop time
    system.time({
      lay <- foreach (i=1:365, .init = lay, .combine = addLayer , .packages = "raster") %dopar% {
        print(i)
        ras <- matrix(rnorm(500, mean = 21, sd = rnorm(21, 12, 4)))
        ras <- raster(ras)
      }
    })
    
    user  system elapsed 
    21.72    0.09   25.58
    

    As we can see, there was an effective speed improvement by using this method.

    Hope this helps.