Search code examples
rmeannar-rasterterra

In a Raster object, how to set the mean to NA if there are too many NAs in one cell?


I am working with environmental data in the Raster format. I have one Raster object where every layer corresponds to a specific time step of observations.

The final objective is to calculate the mean of every cell across the temporal dimension. However, since I have many NAs in the data, I would like to set the mean to NA if there are too many NAs in the timeseries of the specific cell. This way I make sure that the calculated mean is robust, i.e. derived from a sufficient number of actual observations (e.g. I don't have one mean value calculated with 100 observations and another one based on 3 observations and 97 NAs).

In the example below, I would like to set the mean of NA if there are 2 or more NAs in the timeseries.

s <- rast(ncol=10, nrow=10, nlyr=30)
set.seed(1)
values(s) <- rnorm(size(s), 10)
s[3] <- NaN # setting a cell to NA (across all layers)
s[[4]] <- NaN # setting a layer to NA
s[[1]][1] <- NaN # setting some random individual cells to NA
s[[5]][4] <- NaN
mn <- mean(s, na.rm = TRUE) # na.rm = FALSE would just set to NA any mean containing a single NA or more

Solution

  • Example data

    library(terra)
    s <- rast(ncol=10, nrow=10, nlyr=30)
    set.seed(1)
    values(s) <- sample(c(1:10, NA), size(s), replace=TRUE)
    

    You can count the NAs and then use a threshold to mask the original data

    sna <- sum(is.na(s))
    x <- mask(s, sna > 2, maskvalue=TRUE)