I have some code in terra that I'm running using the future
/future.apply
package but I'm running into some memory issues (Error in eval(expr, p) : std::bad_alloc). I'm using the future
package's "multisession" plan and leaving terra
's memfrac
at the default (0.6). I'm wondering if memfrac should be adjusted based on the number of workers. Additionally, I've seen future.callr::callr
may be useful but I'm not sure if this will be beneficial in my scenario or if memfrac
would be handled differently for that plan. I have a small example of code similar to what I am running below.
library(terra)
#> terra 1.7.78
library(future.apply)
#> Loading required package: future
library(stringr)
# Slope function
slopefun<- function(x){
fn<- paste0(str_remove(x, "\\.tif$"),"_slope.tif")
terrain(rast(x), v="slope", filename=fn, overwrite=TRUE)
return(fn)
}
# Create a dataset
r<- rast(volcano,
extent= ext(2667400, 2667400 + ncol(volcano)*10,
6478700, 6478700 + nrow(volcano)*10),
crs = "EPSG:27200")
# List of file names
r_list<- list("test1.tif", "test2.tif", "test3.tif", "test4.tif")
# Write to those file names
writeRaster(r, filename = r_list[[1]], overwrite=TRUE)
writeRaster(r*2, filename = r_list[[2]], overwrite=TRUE)
writeRaster(r*3, filename = r_list[[3]], overwrite=TRUE)
writeRaster(r*4, filename = r_list[[4]], overwrite=TRUE)
nworkers<- 4
plan(strategy = "multisession", workers= nworkers) #Set up parallel
res_list<- future_lapply(r_list, FUN = slopefun)
plan(strategy = "sequential")
# This is more than 1. Do I need to divide this by nworkers?
# For example in this case would should memfrac be reduced to 0.25 or less since there are 4 workers?
terraOptions()$memfrac*nworkers
#> memfrac : 0.6
#> tolerance : 0.1
#> verbose : FALSE
#> todisk : FALSE
#> tempdir : C:/Users/socce/AppData/Local/Temp/Rtmp2puqqO
#> datatype : FLT4S
#> memmin : 1
#> progress : 3
#> [1] 2.4
Created on 2025-01-22 with reprex v2.1.1
It seems to me that that your question, in more general terms, is whether, if you have n
parallel processes with similar memory requirements that access the same physical RAM x
, you need to account for the size of n
to determine the maximum amount of RAM that each process can use.
I would say that each process should not use more RAM than x/n
.
So in your example for four processes and total RAM capped at 60%, I would use memfrac=0.15
for each process, or achieve something similar with memmax=
.
Either can be set with terraOptions
or passed as an additional argument to most raster processing methods. If the additional arguments are used for another purpose, you can use wopt=list(memfrac=0.15)
memfrac
sets a limit to the fraction of available (free) RAM that may be used. The problem with parallel processes is that if they all start at the same time, there may be a lot of RAM that seems available, but won't be, as it needs to be shared between the processes.
You can use terra::mem_info
to investigate what happens.
library(terra)
r1 <- rast(res=1/60)
mem_info(r1)
r2 <- rast(res=1/60, vals=1)
mem_info(r1)
r3 <- rast(res=1/60, vals=1)
mem_info(r1)