Search code examples
rparallel-processingrastersnowmclapply

Parallel processing of big rasters in R (windows)


I'm using the doSNOW package and more specifically the parLapply function to perform reclassification (and subsequently other operations) on a list of big raster datasets (OS: Windows x64).

The code looks a little like this minimalistic example:

library(raster)
library(doSNOW)

#create list containing test rasters

x <- raster(ncol=10980,nrow=10980) 
x <- setValues(x,1:ncell(x)) 

list.x <- replicate( 9 , x )

#setting up cluster

NumberOfCluster <- 8
cl <- makeCluster(NumberOfCluster)
registerDoSNOW(cl)
junk <- clusterEvalQ(cl,library(raster))

#perform calculations on each raster

list.x <- parLapply(cl,list.x,function(x) calc(x,function(x) { x * 10 }))

#stop cluster

stopCluster(cl)

The code actually works as intended. The problem occurs when I want to proceed with the results. I'm receiving this error message:

> plot(list.x[[1]])
Error in file(fn, "rb") : cannot open the connection
In addition: Warning message:
In file(fn, "rb") :
  cannot open file 'C:\Users\*****\AppData\Local\Temp\RtmpyKYdpY\raster\r_tmp_2016-02-29_133158_752_67867.gri': No such file or directory

As far as I understood, since the rasters are quite big, they are saved in a temp file on disk. And when I'm closing the snow cluster, these files can't be accessed anymore.

So my question is, how can I access the data once the cluster is closed? Can I proceed using this method?

Thanks!


Solution

  • I had this exact problem while running the rasterize fucntion inside a cluster in R.

    All tests worked perfectly but when I upscaled to very large and fine resolution rasters, I repeatedly got errors regarding temp files that I couldn't even find on my computer. The list object, which I needed to merge and write as 1 raster, was in R but I could do nothing with it.

    After watching the temp file directory whilst the cluster was running I noticed that closing the cluster will auto-delete all temp files created, so I had to perform the merge and writeRaster functions inside the cluster, otherwise it would fail on a very similar error to yours.