Search code examples
rparallel-processinglapplysnowsnowfall

parallel processing in R using snow


I have 1000's of list and each list has multiple time series. I would like to apply forecasting to each element in the list. This has became an intractable problem interms of computing resources. I don't have backgrounder in parallel computing or advanced R programming. Any help would be greatly appreciated.

I have created dummy list. Basically, dat.list is similar to what I'm working on.

library("snow")
library("plyr")
library("forecast")

    ## Create Dummy Data

    z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1961, 1), frequency = 12)
    lam <- 0.8
    ap <- list(z=z,lam=lam)

## forecast using lapply

    z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1971, 1), frequency = 12)
    lam <- 0.5
    zp <- list(z=z,lam=lam)

    dat.list <- list(ap=ap,zp=zp)

    xa <- proc.time()
    tt <- lapply(dat.list,function(x) lapply(x$z,function(y) (forecast::ets(y))))
    xb <- proc.time()

The above code gives me what I need. I would like apply parrallel processing to both lapply in the code above. So I have attempted to use snow package and an example shown in this site.

  ## Parallel Processing


    clus <- makeCluster(3)
    custom.function <- function(x) lapply(x$z,function(y) (forecast::ets(y)))
    clusterExport(clus,"custom.function")

    x1 <- proc.time()
    tm <- parLapply(clus,dat.list,custom.function)
    x2<-proc.time()

    stopCluster(clus)

Below are my questions,

  1. For some reason, the output of tm is differenct for the non parallel version. the forecast function ets is applied to every single data point as opposed to the element in the list.

Non parallel:

summary(tt)
   Length Class  Mode
ap 3      -none- list
zp 3      -none- list

Parallel Version:

    summary(tm)
       Length Class  Mode
    ap 300    -none- list
    zp 300    -none- list
  1. My second question is how should I parallelize the lapply in the custom function, basically a nested parLapply

    custom.function <- function(x) parLapply(clus,x$z,function(y) (forecast::ets(y))) ## Not working

Many Thanks for your help


Solution

  • The problem is that the forecast package isn't loaded on the cluster workers which causes lapply to iterate over the ts objects incorrectly. You can load forecast on the workers using clusterEvalQ:

    clusterEvalQ(clus, library(forecast))
    

    To answer your second question, your attempt at nested parallelism failed because the workers don't have snow loaded or clus defined. But if you have 1000's of lists then you should have plenty of ways to keep all of your cores busy without worrying about nested parallelism. You're more likely to hurt your performance rather than help it, and it doesn't seem necessary.