Search code examples
rlarge-datar-rasterquantileterra

Use 'global()' and 'quantile()' to compute the quantile values for a large raster map


I encountered some problems when I tried to compute the 0.05 and 0.95 quantile value of a large raster file (.tif file).

I ran my scripts on a high performance computer(see the computing power below) but my R session still get aborted. I tested on a smaller raster file, evething goes well and I can get the quantile values, but when replace by larger rasters, R session was aborted again.

And there is a thing I don't understand. When I check the usage of memory during running my R scripts, the memory usage always goes up to around 50%, no matter which computer I used, with computer with less memory space, it also reached around 50%. When the memory usage reached around 50%, the R session get aborted.

I guess this might be something related to memory, as when I ran on a small raster tile, the scripts worked well.

I used terra package with global() and quantile() to get the quantile values for my raster.

library(terra)

#Load the data
tile <- rast('D:/Data/Tiles/_56.tif')

#Create a dataframe to store outputs.
q <- data.frame(Q.5 = as.numeric(NA), Q.95 = as.numeric(NA))
q$Q.5 <- global(tile, \(i) quantile(i, 0.05, na.rm=T)) 
q$Q.95 <- global(tile, \(i) quantile(i, 0.95, na.rm=T)) 

I also tried with raster package:

library(raster)

#Load the data
r <- raster('I:/WarmingMagnitude_SSP370_v3.tif')

#Create a dataframe to store outputs.
q <- data.frame(quantile0.05 = as.double(NA), quantile0.95 = as.double(NA))
q$quantile0.95 <- as.numeric(quantile(r, 0.95, na.rm = TRUE))
q$quantile0.05 <- as.numeric(quantile(r, 0.05, na.rm = TRUE))

Here you can find the small raster as an example. The large raster file (2.45 GB) as an example can be downloaded here

Here is information about my computer

Here is the error I encountered

Thanks in advance!

I am trying to get the quantile values of my large .tif file in R. I need the 0.05 and 0.95 quantile values.

    > r
class       : SpatRaster 
dimensions  : 161239, 132862, 1  (nrow, ncol, nlyr)
resolution  : 25, 25  (x, y)
extent      : 2634975, 5956525, 1385025, 5416000  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs 
source      : WarmingMagnitude_SSP370_v3.tif 
name        : future microclimate SSP370 
min value   :                       -3.2 
max value   :                        8.9 

Solution

  • To get quantiles for a raster layer, across all cells, you can do

    library(terra)
    r <- rast(system.file("ex/elev.tif", package="terra"))
    global(r, quantile, probs=c(0.05, 0.95), na.rm=TRUE)
    #          X5. X95.
    #elevation 232  489
    

    The problem you encounter is that to use the stats::quantile method, terra::global needs to load all cell values into memory. You do not show(r) your input raster, but I assume that it is rather large. If you look at ?global you see this warning:

    If x is very large global will fail, except when fun is one of "mean", "min", "max", "sum", "prod", "range" (min and max), "rms" (root mean square), "sd" (sample standard deviation), "std" (population standard deviation), "isNA" (number of cells that are NA), "notNA" (number of cells that are not NA).

    I assume that you are not allowed to use all RAM available on a node in your HPC, perhaps because it needs to be shared with multiple cores.

    If indeed your raster has very many cells, you could take a regular sample, which should be close enough for many applications (regular samples are more efficient than random samples with spatial data).

    library(terra)
    r <- rast(system.file("ex/elev.tif", package="terra"))
    x <- spatSample(r, 10^6, "regular")
    quantile(x, probs=c(0.05, 0.95), na.rm=TRUE)
    # 5% 95% 
    #232 489 
    

    Note that the terra::quantile method returns the quantiles for each cell, across layers. That method is not likely to give you memory problems, but it is not what you are after, I think.

    q <- quantile(c(r, r*2))