I am working with raster data and trying to crop and mask a variety of buffers to each raster in a raster stack for various locations. The result is a list of a list of rasters. I got the code to work for a small subset of the data, but now I am trying it over the whole dataset, and it is working very slowly. See example code:
# Example data ------------------------------------------------------------
#create example raster stack
r1 = raster(nrows=1000,ncol=1000,xmn=60,xmx=90,ymn=0,ymx=25)
rr = lapply(1:100, function(i) setValues(r1,runif(ncell(r1))))
rrstack=stack()
for (i in 1:length(rr)){
stacknext=rr[[i]]
rrstack=stack(rrstack,stacknext)
}
#create example shapefile list
lats=runif(26,min=0,max=25)
lons=runif(26,min=60,max=90)
exnames=paste0("city_",letters)
coords=data.frame(names=exnames,lats=lats,lons=lons)
coords_sf = st_as_sf(coords,coords=c("lons","lats"),crs=4326,dim ="XY")
circle1=st_buffer(coords_sf, 1E3)
circle100=st_buffer(coords_sf,1E5)
circle500=st_buffer(coords_sf,5E5)
circlist=list(circle1=circle1,circle100=circle100,circle500=circle500)
circlist_reproj=lapply(circlist,function(x) st_transform(x,crs(rrstack[[1]])))
start <- proc.time()
citlist <- vector(mode='list',length=nrow(circlist_reproj[[1]]))
dellist <- vector(mode='list',length=length(circlist_reproj))
mystack <- stack()
for(k in 1:nrow(circlist_reproj[[1]])) {
for(j in 1:length(circlist_reproj)) {
for (i in 1:nlayers(rrstack)){
maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
maskraster <- raster::crop(maskraster,circlist_reproj[[j]][k,])
mystack <- stack(mystack,maskraster)
}
dellist[[j]] <- mystack
mystack <- stack()
}
citlist[[k]] <- dellist
dellist <- vector(mode='list',length=length(circlist_reproj))
}
basetime <- proc.time()-start
#time taken for computation
basetime
user system elapsed
940.173 84.366 1029.688
As you can see for a set of data smaller than what I have, the computation takes a while. I wanted to try and parallelize the processing but am having trouble figuring out how to do so. I have two issues with it right now. First because of the nature of the nested for loop, I am not sure which for
loop I should change to foreach
. According to this post, it looks like it is the first one, though I am not sure that stands for all nested for loops. When I make the first for
loop foreach
I then get the error Error in { : task 1 failed - "could not find function "nlayers""
I then try and add the package argument in the foreach
call resulting in a nested for loop that looks like
foreach(k = 1:nrow(circlist_reproj[[1]], .packages='raster')) %dopar% {
for(j in 1:length(circlist_reproj)) {
for (i in 1:nlayers(rrstack)) {
maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
maskraster <- raster::crop(maskraster,circlist_reproj[[j]][k,])
mystack <- stack(mystack,maskraster)
}
dellist[[j]] <- mystack
mystack <- stack()
}
citlist[[k]] <- dellist
dellist <- vector(mode='list',length=length(circlist_reproj))
}
Which then gives the error
unused argument (.packages = "raster")
So I am not sure how to properly apply the .packages
argument to the foreach
function. What am I doing wrong here?
EDIT
Taking @HenrikB's comment, I have looked at my code and reworked it. I now have the following foreach
loops. Now the code completes, but it results in all null values.
cores <- detectCores()
cl <- makeCluster(cores[1]-2) #not to overload your computer
registerDoParallel(cl)
start <- proc.time()
citlist <- vector(mode='list',length=nrow(circlist_reproj[[1]]))
dellist <- vector(mode='list',length=length(circlist_reproj))
mystack <- stack()
foreach(k = 1:nrow(circlist_reproj[[1]])) %:%
foreach(j = 1:length(circlist_reproj))%:%
foreach (i = 1:nlayers(rrstack), .packages=c('raster','sf')) %dopar% {
maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
maskraster <- raster::crop(maskraster,circlist_reproj[[j]][k,])
mystack <- stack(mystack,maskraster)
dellist[[j]] <- mystack
mystack <- stack()
citlist[[k]] <- dellist
dellist <- vector(mode='list',length=length(circlist_reproj))
}
partime <- proc.time()-start
After taking @Henrik's comments and reworking my code a bit, I was able to come up with a solution that solves the problem via parallelization, however it is slower than the base solving. But that is for another post. Here is the solution:
cores <- detectCores()
cl <- makeCluster(cores[1]-2) #not to overload your computer
registerDoParallel(cl)
citlist <- vector(mode='list',length=nrow(circlist_reproj[[1]]))
dellist <- vector(mode='list',length=length(circlist_reproj))
for(k in 1:nrow(circlist_reproj[[1]])) {
for(j in 1:length(circlist_reproj)) {
parrasterstack <- foreach(i=1:nlayers(rrstack),.packages=c('raster','sf')) %dopar% {
maskraster <- raster::mask(rrstack[[i]],circlist_reproj[[j]][k,])
raster::crop(maskraster,circlist_reproj[[j]][k,])
}
parrasterstack <- stack(parrasterstack)
dellist[[j]] <- parrasterstack
parrasterstack <- NULL
}
citlist[[k]] <- dellist
dellist <- vector(mode='list',length=length(circlist_reproj))
}
stopCluster(cl)