Search code examples
rterra

Applying SPEI::hargreaves function to time series from each pixel of SpatRaster using terra package R?


library(SPEI)
library(terra)

Basically, given a monthly time series for two variables a and b as follows:

a = array(1:(3*4*12*64),c(3,4,12*64))
a = rast(a)

dates=seq(as.Date("1950-01-01"), as.Date("2013-12-31"), by="month")
terra::time(a)=dates
names(a) <- zoo::as.yearmon(time(a))

b = array(1:(3*4*12*64),c(3,4,12*64))
b = rast(b)

dates=seq(as.Date("1950-01-01"), as.Date("2013-12-31"), by="month")
terra::time(b) <- dates
names(b) <- zoo::as.yearmon(time(b))

Both a and b are time series with a time dimension. Now, I would like to apply the function SPEI::hargreaves to time series of each pixel in a and b and return a SpatRaster C

From the SPEI package , har <- hargreaves(TMIN,TMAX,lat=37.6475). Here, consider Tmin=a, Tmax=b and lat=latitude of each pixel which is the same in a and b. I will parallelize the process once I get an idea how to apply the function to my SpatRasters.

At the moment, I am using data.table to collapse my rasterbricks and then applying hargreaves to the table. This approach is very inefficient so far.

Here is what I have so far:

library(SPEI)
library(terra)
library(zoo)

har <- function(a, b, lat) {
  SPEI::hargreaves(as.vector(a), as.vector(b), lat,na.rm = TRUE)
} 

lat <- init(rast(a), "y")
PET1 <- terra::lapp(x=c(a, b, lat), fun = Vectorize(har))

Howvever, I get the error:

#Error: [lapp] cannot use 'fun'. The number of values returned is less 
#than the number of input cells.
#Perhaps the function is not properly vectorized

Thanks to @Robert Hijmans' answer, I experimented with terra and raster packages.

raster solution

library(SPEI)
library(raster)
library(zoo)

har <- function(Tmin, Tmax, lat) {
  SPEI::hargreaves(Tmin, Tmax, lat,na.rm = TRUE)
} 

lat <- init(raster(Tmin), "y")

PET_raster <- raster::overlay(Tmin, Tmax, lat, fun = Vectorize(har))

Using actual data, this takes more than 15 minutes to run on my laptop.

PET_raster 
class      : RasterBrick
dimensions : 68, 104, 7072, 1020  (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.25  (x, y)
extent     : -98.285, -72.285, 39.875, 56.875  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs
source     : r_tmp_2025-01-29_175851.174327_307486_90781.grd
names      :    layer.1,    layer.2,    layer.3,    layer.4,    layer.5,    layer.6,    layer.7,    layer.8,    layer.9,   layer.10,   layer.11,   layer.12,   layer.13,   layer.14,   layer.15, ...
min values :  0.0000000,  0.0000000,  0.0000000, 27.9045500, 40.7257577, 50.3362055, 41.2878256, 49.9874998, 47.6142315, 37.7349603,  7.4571957,  0.0000000,  0.0000000,  0.0000000,  0.0000000, ...
max values :   47.11585,   61.65605,   73.30580,  112.98168,  158.29348,  183.51450,  204.36278,  192.53549,  187.70001,  151.89846,   78.30323,   72.70451,   51.80651,   65.65896,   80.28027, ...

terra solution by @Robert Hijmans

library(SPEI)
library(terra)
library(zoo)


lat <- init(rast(Tmin), "y")

r <- c(Tmin, Tmax, lat)
nl <- nlyr(Tmin)
nl2 <- nl + nl

PET_terra <- app(r, \(i) apply(i, 1, 
                       \(j) SPEI::hargreaves(j[1:nl], j[(nl+1):(nl2)], lat=j[nl2+1], verbose=FALSE)))

Using actual data, this takes less than 15 seconds to run on my laptop. Terrific difference in speed but I doubt which of them (terra vs raster) is correct.

I have more than 10TB of data and will prefer terra in this case.

PET_terra 

class       : SpatRaster
dimensions  : 68, 104, 1020  (nrow, ncol, nlyr)
resolution  : 0.25, 0.25  (x, y)
extent      : -98.285, -72.285, 39.875, 56.875  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
source(s)   : memory
names       :    lyr.1,    lyr.2,    lyr.3,    lyr.4,     lyr.5,    lyr.6, ...
min values  :  0.00000,  0.00000,  0.00000, 13.75159,  31.91098,  35.7943, ...
max values  : 17.54078, 28.36545, 49.92122, 89.34686, 138.97809, 171.4730, ...

Problem

terra and raster give significantly different min and max values, implying that both packages are not doing the same calculations.

Any thoughts on the differences?


Solution

  • I thought that would be be possible with the below but that fails and perhaps that can be fixed.

    s <- sds(a, a+10, lat)
    names(s) <- c("Tmin", "Tmax", "lat")
    x <- lapp(s, hargreaves, usenames=TRUE, recycle=TRUE, verbose=FALSE)
    

    The below is ugly, but seems to work:

    r <- c(a, a+10, lat)
    nl <- nlyr(a)
    nl2 <- nl + nl
    x <- app(r, \(i) apply(i, 1, 
         \(j) hargreaves(j[1:nl], j[(nl+1):(nl2)], lat=j[nl2+1], verbose=FALSE)))
    
    x
    #class       : SpatRaster 
    #dimensions  : 3, 4, 768  (nrow, ncol, nlyr)
    #resolution  : 1, 1  (x, y)
    #extent      : 0, 4, 0, 3  (xmin, xmax, ymin, ymax)
    #coord. ref. :  
    #source(s)   : memory
    #names       :     lyr.1,    lyr.2,    lyr.3,    lyr.4,    lyr.5,    lyr.6, ... 
    #min values  :  77.02184, 109.5203, 166.0411, 197.9813, 234.9566, 256.6363, ... 
    #max values  : 115.17347, 145.1712, 204.9765, 232.4847, 266.2186, 284.1987, ... 
    

    I think this is correct because

    qed <- function(cell) {
      xhg <- as.vector(t(x[cell])[,1])
      Tmin <- unlist(a[cell])
      Tmax <- Tmin + 10
      Lat <- unlist(lat[cell])
      hg <- hargreaves(Tmin, Tmax, lat=Lat, verbose=F)
      all.equal(hg, xhg)
    }
    qed(5)
    #[1] TRUE