Search code examples
rparallel-processingspatialrastertrend

R: How to use parallel computing on calculating trends in raster


Recently I came across over this library in R spatialEco. I want to calculate the Kendall tau statistic for a raster stack in R. However, this would take a lot of time since this library is using just one core on the computer (the raster I plan to use is at a global extent at 250 m resolution)

library(raster)
library(spatialEco)

r.logo <- stack(system.file("external/rlogo.grd", package="raster"),
system.file("external/rlogo.grd", package="raster"),
system.file("external/rlogo.grd", package="raster"))

# Calculate trend slope with p-value and confidence level(s)
start_time <- Sys.time()

logo.trend <- raster.kendall(r.logo, tau = TRUE, intercept = TRUE, p.value = TRUE,
z.value = TRUE, confidence = TRUE)

end_time <- Sys.time()
end_time - start_time

names(logo.trend) <- c("slope","tau", "intercept", "p.value", "z.value", "LCI", "UCI")
plot(logo.trend)

Is is possible to use a library like library(parallel) in order to calculate the trend on a raster stack? Is it necessary to convert the data to a matrix and then use these libraries?


Solution

  • spatialEco::raster.kendall() calls raster::overlay() and one can run that in parallel:

    1. Get the relevant function from spatialEco::raster.kendall():

      trend.slope <- function(y, p.value.pass = TRUE, z.pass = TRUE, 
                              tau.pass = TRUE, confidence.pass = TRUE, intercept.pass = TRUE) {
          options(warn = -1)
          fit <- EnvStats::kendallTrendTest(y ~ 1)
          fit.results <- fit$estimate[2]
          if (tau.pass == TRUE) {
              fit.results <- c(fit.results, fit$estimate[1])
          }
          if (intercept.pass == TRUE) {
              fit.results <- c(fit.results, fit$estimate[3])
          }
          if (p.value.pass == TRUE) {
              fit.results <- c(fit.results, fit$p.value)
          }
          if (z.pass == TRUE) {
              fit.results <- c(fit.results, fit$statistic)
          }
          if (confidence.pass == TRUE) {
              ci <- unlist(fit$interval["limits"])
              if (length(ci) == 2) {
                  fit.results <- c(fit.results, ci)
              }
              else {
                  fit.results <- c(fit.results, c(NA, NA))
              }
          }
          options(warn = 0)
          return(fit.results)
      }
      
    2. Start a cluster with n nodes.

      beginCluster(n=2)
      
    3. Do the calculations in parallel.

      logo.trend.parallel <- clusterR(r.logo, overlay, args=list(fun=trend.slope))
      
    4. Stop cluster.

      endCluster()