Does clusterApply in R affect calculation of average for each layer of raster brick?

I have raster bricks for daily minimum and maximum temperatures from Daymet. The geographic extent is continental U.S. Each brick has 365 layers, one for each day of the year. I want to calculate the mean daily temperature from pair of min. and max. bricks by year. It seems the overlay() function in the R raster package does what I want. For example:

library(raster)
tmin <- brick("tmin_2018.nc")
tmax <- brick("tmax_2018.nc")
tmean <- overlay(tmin, tmax, fun=mean)

The thing is, it takes a long time and a lot of RAM. More RAM than I have. I have access to high performance computing cluster. I have code for this process on the cluster that utilizes the OpenMPI utility and can provide parallel processing. My test case only uses 5 layers. I request 6 nodes from the cluster for the job.

library(iterators)
library(foreach)
library(snow)
library(parallel)
library(Rmpi)
library(compiler)
library(raster)
library(rgdal)
library(ncdf4)
print(mpi.universe.size())

nras <- brick("infil/ras_in/daymet_v3_tmax_2009_na.nc4")
xras <- brick("infil/ras_in/daymet_v3_tmin_2009_na.nc4")

cl <- makeCluster( mpi.universe.size() - 1, type='MPI' )

clusterExport(cl, "nras")
clusterExport(cl, "xras")
clusterExport(cl, c('overlay', 'writeRaster', 'brick'))

easy_fn <- function(i) {
        overlay(nras[[i]], xras[[i]], fun = mean)
        }

out <- clusterApply(cl, 1:5, fun=easy_fn)
outbrickb <- brick(out)
writeRaster(outbrickb, "outfil/rasout/testbrick.tif", wopt=list(datatype="FLT4S", filetype="GTiff"), overwrite=T)

And that also works. But the values are different. When I calculate the mean for one layer (day), with the first code chunk above I get values between -55.255 and 31.08. After running small number of layers as a test on HPC cluster with the 2nd code chunk I get values between -50 and 44.5. If things were off by decimal places, I wouldn't be concerned, but the values are far different. Is there something I'm missing in how clusterApply() or my little easy_fn() work vs. just calling overlay() that will lead to different values?

Solution

It may be easiest and faster to use terra instead of raster

Example data

library(terra)
tmin <- rast(system.file("ex/logo.tif", package="terra")) / 10
tmax <- tmin + 10

Solution

tavg1 <- mean(tmin, tmax)

Note that this works in terra because it computes the parallel mean (by layer), whereas raster would return the mean of all layers. To get that with terra you would do

m <- mean(c(tmin, tmax))

The equivalent to overlay in raster is terra::lapp, and one might need it for more complex cases.

tavg2 <- lapp(sds(tmin, tmax), function(x,y) (x + y)/2)

all(values(tavg1) == values(tavg2))
#[1] TRUE

Now with raster

Example data

library(raster)
tmin <- brick(tmin)
tmax <- brick(tmax)

Using an algebraic approach could be faster than overlay

tavg3 <- (tmin + tmax) / 2
tavg4 <- overlay(tmin, tmax, fun = mean)

With your function

easy_fn <- function(i) {
    overlay(tmin[[i]], tmax[[i]], fun = mean)
}

x <- lapply(1:nlayers(tmin), easy_fn)
tavg5 <- stack(x)

And now in parallel

library(parallel)
cl <- makeCluster(getOption("cl.cores", 2))
clusterExport(cl, c('overlay', 'tmin', 'tmax'))
out <- clusterApply(cl, 1:nlayers(tmin), fun=easy_fn)
stopCluster(cl)
tavg6 <- stack(out)

all(values(tavg1) == values(tavg3))
#[1] TRUE
all(values(tavg1) == values(tavg4))
#[1] TRUE
all(values(tavg1) == values(tavg5))
#[1] TRUE
all(values(tavg1) == values(tavg6))
#[1] TRUE

All results are the same.

Now with your larger datasets, and timing the different approaches

fmx <- "daymet_v4_daily_hi_tmax_2015.nc"
fmn <- "daymet_v4_daily_hi_tmin_2015.nc"
library(terra)
tmin <- rast(fmn)
tmax <- rast(fmx)
system.time(tavg1 <- mean(tmin, tmax))
#   user  system elapsed 
#   1.88    0.75    2.62 

system.time(tavg2 <- lapp(sds(tmin, tmax), function(x,y) (x + y)/2))
#   user  system elapsed 
#   2.34    1.40    3.76

These are clearly the fastest approaches (see below). You can add a filename argument to lapp, but with mean you would have to also use writeRaster

And with raster

library(raster)
tmin <- brick(fmn)
tmax <- brick(fmx)
system.time(tavg3 <- (tmin + tmax) / 2)
#   user  system elapsed 
#  13.94    4.42   18.47

In this case overlay is extraordinarily slow

system.time(tavg4 <- overlay(tmin, tmax, fun = mean))
#  user  system elapsed 
# 478.06    5.78  488.16 

system.time(x <- lapply(1:nlayers(tmin), easy_fn))
#   user  system elapsed 
# 474.03    5.14  482.42 
tavg5 <- stack(x)

It does go faster with parallelization; but not enough to make it worth pursuing

library(parallel)
cl <- makeCluster(getOption("cl.cores", 2))
clusterExport(cl, c('overlay', 'tmin', 'tmax'))
system.time(out <- clusterApply(cl, 1:nlayers(tmin), fun=easy_fn))
#   user  system elapsed 
#   0.53    0.78  184.82 
stopCluster(cl)
tavg6 <- stack(out)

I do not see a meaningful difference between the sequential and parallelized approaches

d <- tavg6 - tavg4
d
#class      : RasterBrick 
#dimensions : 584, 284, 165856, 365  (nrow, ncol, ncell, nlayers)
#resolution : 1000, 1000  (x, y)
#extent     : -5802750, -5518750, -622500, -38500  (xmin, xmax, ymin, ymax)
#crs        : +proj=lcc +lat_0=42.5 +lon_0=-100 +lat_1=25 +lat_2=60 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs 
#source     : r_tmp_2021-11-20_103534_54328_41274.grd 
#names      :       layer.1,       layer.2,       layer.3,       layer.4,       layer.5,       layer.6,       layer.7,       layer.8,       layer.9,      layer.10,      layer.11,      layer.12,      layer.13,      layer.14,      layer.15, ... 
#min values : -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, -9.536743e-07, ... 
#max values :  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07,  9.536743e-07, ...