Search code examples
rrasterspatialterra

Remove outliers from Raster object with multiple layers


I have a NetCDF file at very high spatial resolution:

dimensions : 2160, 4320, 9331200, 172  (nrow, ncol, ncell, nlayers)
resolution : 0.0833333, 0.0833333  (x, y)
extent     : -180, 179.9998, -90, 89.99993  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 

The final objective is to calculate the mean across all 172 layers (i.e. the temporal dimension). Before doing that, I would like to remove outliers from the time series cell by cell. For instance using the formula: x ± 2.5 * std(x), where x is the timeseries of the data in every cell.

I tried transforming the Raster object into a dataframe, but it is quite slow given the resolution. I also tried using the functions clamp and reclassify from the raster package, but the values used to subset are fixed for the entire Raster object:

x <- clamp(x, lower = 0, upper = 2.5)

What I'm looking for is a function to remove outliers based on the timeseries of every cell (i.e. dynamic lower and upper range above).

Is there a way of doing this directly with the Raster format, either with the package Raster or Terra?

There is a similar question here but in that case the Raster object doesn't have multiple layers (corresponding to the temporal dimension).


Solution

  • Example data (please always include some when asking an R question)

    library(terra)
    s <- rast(ncol=10, nrow=10, nlyr=30)
    set.seed(1)
    values(s) <- rnorm(size(s), 10)
    s[1] <- NA
    

    Solution 1: Compute the mean and sd, and use ifel

    mn <- mean(s)
    sdv <- stdev(s) * 1.5
    lower <- mn-sdv
    upper <- mn+sdv
    
    x <- ifel(s < lower, NA, ifel(s > upper, NA, s))
    

    Check the results for a cell

    i <- 57
    round(s[i], 2)
    #  lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9 lyr.10 lyr.11 lyr.12 lyr.13
    #1  9.63    11   7.6  8.83 10.51  7.67    10 12.02  9.23  10.96  11.21  10.12  10.61
    #  lyr.14 lyr.15 lyr.16 lyr.17 lyr.18 lyr.19 lyr.20 lyr.21 lyr.22 lyr.23 lyr.24 lyr.25
    #1   8.83  10.26  11.73   9.49   8.73   9.02  11.75   7.92   9.95    9.6    9.5   9.46
    #  lyr.26 lyr.27 lyr.28 lyr.29 lyr.30
    #1   9.72  11.38  10.61   7.35  10.36
    
    cbind(lower[i], upper[57])
    #      mean    mean
    #1 7.987305 11.6816
    
    round(x[i], 2)
    #  lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9 lyr.10 lyr.11 lyr.12 lyr.13
    #1  9.63    11    NA  8.83 10.51    NA    10    NA  9.23  10.96  11.21  10.12  10.61
    #  lyr.14 lyr.15 lyr.16 lyr.17 lyr.18 lyr.19 lyr.20 lyr.21 lyr.22 lyr.23 lyr.24 lyr.25
    #1   8.83  10.26     NA   9.49   8.73   9.02     NA     NA   9.95    9.6    9.5   9.46
    #  lyr.26 lyr.27 lyr.28 lyr.29 lyr.30
    #1   9.72  11.38  10.61     NA  10.36
    

    Solution 2: use app

    f <- function(x) {
        m <- mean(x) + c(-1,1) * sd(x)
        x[x < m[1] | x > m[2]] <- NA
        x
    }
    y <- app(s, f)
    

    Solution 3. With terra 1.7.20 (currently the development version) you can now also do

     z <- clamp(s, lower, upper, values=FALSE)