Search code examples
rparallel-processingmapstime-seriesraster

writeRaster to NETCDF parallelisation R


I have a large rasterstack (s) with the following details:

class       : RasterStack
dimensions  : 510, 1068, 544680, 19358  (nrow, ncol, ncell, nlayers)
resolution  : 0.08333333, 0.08333333  (x, y)
extent      : -141, -52, 41, 83.5  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0
names       : Jan.1961.1, Jan.1961.2, Jan.1961.3, Jan.1961.4, Jan.1961.5, Jan.1961.6, Jan.1961.7, Jan.1961.8, Jan.1961.9, Jan.1961.10, Jan.1961.11, Jan.1961.12, Jan.1961.13, Jan.1961.14, Jan.1961.15, ...
time        : 1961-01-01 - 2013-12-31 (range)

Doing something like:

writeRaster( s,"PP", overwrite=TRUE, format="CDF", varname="P", varunit="mm", 
             longname="totals", xname="lon", yname="lat",zname="time",
             zunit="numeric")

takes more than 2 weeks to complete on my computer. How can I run this in parallel (may be via foreach loop and %dopar% command) to get the same results with shorter processing time?

sample data

s=brick(nrows=510, ncols=1068, xmn=-180, xmx=180, ymn=-90, ymx=90, crs="+proj=longlat +datum=WGS84", nl=193581)
dates=seq(as.Date("1961-01-01"), as.Date("2013-12-31"), by="day")
s<- setZ(s,dates)

NB: my true data is a rasterstack not brick.


Solution

  • You can try this code, but I did not really tested it on a big dataset. And I did not tested the ncecat part... I'll update it later, but you can try in the meantime.

    wd <- "~/Bureau/Tmp"
    
    # stack with 16 layers
    nl <- 16 # 19358
    s <- brick(nrows = 510,  ncols = 1068,
               xmn = -180, xmx = 180, ymn = -90, ymx = 90,
               crs = "+proj=longlat +datum=WGS84",
               nl = nl)
    dates <- seq(as.Date("1961-01-01"), as.Date("2013-12-31"), by = "day")
    s <- setZ(s, dates)
    
    require(foreach)
    require(doParallel)
    cl <- makeCluster(4)
    registerDoParallel(cl)
    
    tmp <- foreach(i = 1:nlayers(s)) %dopar% 
    {  
      r <- raster::raster(s, i)
      raster::writeRaster(r, 
                          filename = paste0(wd,
                            "/PP_", formatC(i, width = 6, flag = "0")),
                          overwrite=TRUE, format="CDF", varname="P", varunit="mm", 
                   longname="totals", xname="lon", yname="lat",zname="time",
                   zunit="numeric")
      rm(r)
    }
    stopCluster(cl)    
    
    ppfiles <- list.files(wd)[grep("PP_", list.files(wd))]
    system(paste0("ncecat ppfiles output.nc")