I am trying to run terra::extract to sum population (from a .tif) to global regions (from a .shp). Below is a reproducible example of the code that I run. In the case below, the memory use before (A) and after (B) the extract function and after memory clearup (C) is almost similar. However, running the exact same code but with a larger .tif and .shp leads to a very large spike of memory use, where memory use is (A: 142Mb, B: 14882Mb, C: 143.3MB ). I can imagine a difference because the datasets are larger and there is more computational effort, but this seems rather excessive. Is this normal, and if not, is there a way cleaning memory on the fly during terra::extract? I am also trying to run it on a stack of rasters, but after hours of running it unsurprisingly gives an out of memory error.
library(terra)
myfun <- function(x){as.integer(sum(x,na.rm=T))}
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
rf <- system.file("ex/elev.tif", package="terra")
x <- rast(rf)
paste0("A: ",memory.size()," MB.")
a <-terra::extract(x, v, myfun, bind=TRUE)
paste0("B: ",memory.size()," MB.")
gc()
paste0("C: ",memory.size()," MB.")
What happens is that first all raw data are extracted, and then the summary function is applied. If you have polygons that cover very many cells that may not be a viable approach and I will change that. For now, you may consider using zonal
instead:
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
x <- rast(system.file("ex/elev.tif", package="terra"))
#a <- terra::extract(x, v, myfun)
v$ID <- 1:nrow(v)
z <- rasterize(v, x, "ID")
b <- zonal(x, z, sum, na.rm=TRUE)