Search code examples
rstackmeanraster

Calculating mean, median and standard deviation in stack raster for different time steps


I have a raster brick/stack (using the raster package) in R for 45 years of annual rainfall data from 1970 to 2015. I want to calculate mean, median and standard deviation for a given year e.g. 2015 using the last 5 years, 10 years, 15 years, 20 years, 30 years. I want to do it from 2000 to 2015, where this processes repeated for every year using the stacked data and save the newly derived rasters for a given year. This the example code. Any help is greatly appreciated.

raster <- raster(ncol=10, nrow=10)
raster_brick <- brick( sapply(1:45, function(i) setValues(r, rnorm(ncell(r), i, 3))))
plot(raster_brick)
str(raster_brick)

Solution

  • To achieve this task, we can use the calc function from the raster package. We also need to know how to subset the RasterBrick object.

    Data preparation

    library(raster)
    set.seed(123)
    r <- raster(ncol=10, nrow=10)
    r_brick <- brick(sapply(1:45, function(i) setValues(r, rnorm(ncell(r), i, 3))))
    

    Example of calc function

    calc can apply a function for all layers in a RasterBrick object. The end result is a raster layer.

    # Calculate mean
    r_mean <- calc(r_brick, mean)
    # Calculate median
    r_median <- calc(r_brick, median)
    # Calculate sd
    r_sd <- calc(r_brick, sd)
    

    Notice that r_mean, r_median, and r_sd are all RasterLayer.

    Example of subset a RasterBrick

    We can use the index to subset the layer. For example,

    r_sub <- r_brick[[1:3]]
    

    r_sub is the first three layers of r_brick

    A function to Conduct the Analysis

    Knowing the technique of calc and subset, we can design a function to conduct the analysis.

    The first thing to do is create a vector serving as a reference to year and index.

    # Create the index
    ind <- 1:45
    names(ind) <- 1971:2015
    

    Calling the year number to ind will return the index. For example,

    # Get the index of 2015
    ind[as.character(2015)]
    #2015 
    #  45
    

    Now design the function, which has five arguments

    end_year: The end year of analysis

    n_year: The last n year in terms of the end year

    FUN: A function, such as mean, median, and sd

    index: The year index (ind)

    ras_brick: RasterBrick to work with

    # Define the function
    
    raster_stat <- function(end_year, n_year, FUN, index, ras_brick){
      
      # Subset the raster
      index_temp <- index[as.character((end_year - n_year + 1):end_year)]
      ras_brick_temp <- ras_brick[[index_temp]]
      
      # Calculate the statistics
      ras_result <- calc(ras_brick_temp, FUN)
      
      # Set the name
      names(ras_result) <- paste("Y", end_year, n_year, substitute(FUN), sep = "_")
      
      return(ras_result)
    }
    

    Now we can test the function.

    raster_stat(2015, 5, FUN = sd, index = ind, ras_brick = r_brick)
    #class       : RasterLayer 
    #dimensions  : 10, 10, 100  (nrow, ncol, ncell)
    #resolution  : 36, 18  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
    #data source : in memory
    #names       : Y_2015_5_sd 
    #values      : 12.05333, 14.61298  (min, max)
    

    Notice that the result of the raster_stat function has the name Y_2015_5_sd. This is helpful to identify which end_year, n_year, and FUN was applied.

    Apply the function

    We can use for loop to apply raster_stat through all end_year and n_year. Here is an example calculating the mean.

    # Set the range of end_year and n_year
    end_year_vec <- 2000:2015
    n_year_vec <- c(5, 10 , 15, 20, 30)
    
    # Create an empty list to store result
    r_mean_list <- list()
    
    for (i in end_year_vec){
      for(j in n_year_vec){
          result_temp <- raster_stat(end_year = i, n_year = j, FUN = mean, 
                                     index = ind, ras_brick = r_brick)
          # Add the raster layer to the result_list
          r_mean_list[[names(result_temp)]] <- result_temp
      }
    } 
    

    All the results are stored in r_mean_list with a unique name. We can use the same approach for median and sd.