Search code examples
rforeachparallel-processingnested-loopsraster

How to parallelize a nested for loop involving rasters?


I am working with raster data and trying to crop and mask a variety of buffers to each raster in a raster stack for various locations. The result is a list of a list of rasters. I got the code to work for a small subset of the data, but now I am trying it over the whole dataset, and it is working very slowly. See example code:

# Example data ------------------------------------------------------------

#create example raster stack
r1 = raster(nrows=1000,ncol=1000,xmn=60,xmx=90,ymn=0,ymx=25)
rr = lapply(1:100, function(i) setValues(r1,runif(ncell(r1))))
rrstack=stack()
for (i in 1:length(rr)){
  stacknext=rr[[i]]
  rrstack=stack(rrstack,stacknext)
}


#create example shapefile list

lats=runif(26,min=0,max=25)
lons=runif(26,min=60,max=90)
exnames=paste0("city_",letters)
coords=data.frame(names=exnames,lats=lats,lons=lons)
coords_sf = st_as_sf(coords,coords=c("lons","lats"),crs=4326,dim ="XY")
circle1=st_buffer(coords_sf, 1E3)
circle100=st_buffer(coords_sf,1E5)
circle500=st_buffer(coords_sf,5E5)
circlist=list(circle1=circle1,circle100=circle100,circle500=circle500)

circlist_reproj=lapply(circlist,function(x) st_transform(x,crs(rrstack[[1]])))

start <- proc.time()
citlist <- vector(mode='list',length=nrow(circlist_reproj[[1]]))
dellist <- vector(mode='list',length=length(circlist_reproj))
mystack <- stack()
for(k in 1:nrow(circlist_reproj[[1]])) {
  for(j in 1:length(circlist_reproj)) { 
    for (i in 1:nlayers(rrstack)){
      maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
      maskraster <- raster::crop(maskraster,circlist_reproj[[j]][k,])
      mystack <- stack(mystack,maskraster)
    }
    dellist[[j]] <- mystack
    mystack <- stack()
  }
  citlist[[k]] <- dellist
  dellist <- vector(mode='list',length=length(circlist_reproj))
}
basetime <- proc.time()-start

#time taken for computation

 basetime
    user   system  elapsed 
 940.173   84.366 1029.688 

As you can see for a set of data smaller than what I have, the computation takes a while. I wanted to try and parallelize the processing but am having trouble figuring out how to do so. I have two issues with it right now. First because of the nature of the nested for loop, I am not sure which for loop I should change to foreach. According to this post, it looks like it is the first one, though I am not sure that stands for all nested for loops. When I make the first for loop foreach I then get the error Error in { : task 1 failed - "could not find function "nlayers"" I then try and add the package argument in the foreach call resulting in a nested for loop that looks like

foreach(k = 1:nrow(circlist_reproj[[1]], .packages='raster')) %dopar% {
  for(j in 1:length(circlist_reproj))  { 
    for (i in 1:nlayers(rrstack)) {
      maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
      maskraster <- raster::crop(maskraster,circlist_reproj[[j]][k,])
      mystack <- stack(mystack,maskraster)
    }
    dellist[[j]] <- mystack
    mystack <- stack()
  }
  citlist[[k]] <- dellist
  dellist <- vector(mode='list',length=length(circlist_reproj))
}

Which then gives the error

  unused argument (.packages = "raster")

So I am not sure how to properly apply the .packages argument to the foreach function. What am I doing wrong here?

EDIT

Taking @HenrikB's comment, I have looked at my code and reworked it. I now have the following foreach loops. Now the code completes, but it results in all null values.

cores <- detectCores()
cl <- makeCluster(cores[1]-2) #not to overload your computer
registerDoParallel(cl)

start <- proc.time()
citlist <- vector(mode='list',length=nrow(circlist_reproj[[1]]))
dellist <- vector(mode='list',length=length(circlist_reproj))
mystack <- stack()
foreach(k = 1:nrow(circlist_reproj[[1]])) %:%
  foreach(j = 1:length(circlist_reproj))%:%   
  foreach (i = 1:nlayers(rrstack), .packages=c('raster','sf')) %dopar% {
    maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
    maskraster <- raster::crop(maskraster,circlist_reproj[[j]][k,])
    mystack <- stack(mystack,maskraster)
    dellist[[j]] <- mystack
    mystack <- stack()
    citlist[[k]] <- dellist
    dellist <- vector(mode='list',length=length(circlist_reproj))
  }
partime <- proc.time()-start

Solution

  • After taking @Henrik's comments and reworking my code a bit, I was able to come up with a solution that solves the problem via parallelization, however it is slower than the base solving. But that is for another post. Here is the solution:

    cores <- detectCores()
    cl <- makeCluster(cores[1]-2) #not to overload your computer
    registerDoParallel(cl)
    
    citlist <- vector(mode='list',length=nrow(circlist_reproj[[1]]))
    dellist <- vector(mode='list',length=length(circlist_reproj))
    for(k in 1:nrow(circlist_reproj[[1]])) {
      for(j in 1:length(circlist_reproj)) { 
        parrasterstack <- foreach(i=1:nlayers(rrstack),.packages=c('raster','sf')) %dopar% {
        maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
        raster::crop(maskraster,circlist_reproj[[j]][k,])
        }
      parrasterstack <- stack(parrasterstack)
      dellist[[j]] <- parrasterstack
      parrasterstack <- NULL
      }
    citlist[[k]] <- dellist
    dellist <- vector(mode='list',length=length(circlist_reproj))
    }
    
    stopCluster(cl)