Search code examples
rtidyverserasterspatial

Subsetting a rasterbrick to give mean of three minimum months in each year


I'm interested in creating two variables from a time-series of spatial raster data in R from an netcdf file. Opening the data, subsetting by Z and creating a mean value is straight forward:

# Working example below using a ~16mb nc file of sea ice from HadISST

library(R.utils)
library(raster)
library(tidyverse)

download.file("https://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_ice.nc.gz","HadISST_ice.nc.gz")
gunzip("HadISST_ice.nc.gz", ext="gz", FUN=gzfile)

hadISST <- brick('Datasets/HadISST/HadISST_ice.nc') 

# subset to a decade time period and create decadal mean from monthly data
hadISST_a <- hadISST %>% subset(., which(getZ(.) >= as.Date("1900-01-01") & getZ(.) <= as.Date("1909-12-31"))) %>% mean(., na.rm = TRUE)

But, I'm interested in extracting 1) annual mean values, and 2) annual mean of three minimum monthly values for the subsetted time period. My current work flow is using nc_open() and ncvar_get() to open the data, raster::extract() to get the values and then tidverse group_by() and slice_min() to get the annual coolest months, but it's a slow and cpu intensive approach. Is there a more effective way of doing this without converting from raster - data.frame?

Questions:

  1. Using the above code how can I extract annual means rather than a mean of ALL months over the decadal period?

  2. Is there a way of using slice_min(order_by = sst, n = 3) or similar with brick objects to get the minimum three values per year prior to annual averaging?


Solution

  • Example data

    if (!file.exists("HadISST_ice.nc")) { 
         download.file("https://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_ice.nc.gz","HadISST_ice.nc.gz")
         R.utils:::gunzip("HadISST_ice.nc.gz")
    }
    library(terra)
    hadISST <- rast('HadISST_ice.nc') 
    

    Annual mean

    y <- format(time(hadISST), "%Y")
    m <- tapp(hadISST, y, mean)
    

    Mean of the lowest three monthly values by year (this takes much longer because a user-defined R function is used). I now see that there is a bug in the CRAN version. You can instead use version 1.5-47 that you can install like this install.packages('terra', repos='https://rspatial.r-universe.dev').

    f <- function(i) mean(sort(i)[1:3])
    m3 <- tapp(hadISST, y, f)
    

    To make this faster (if you have multiple cores):

    m3 <- tapp(hadISST, y, f, cores=4)