Search code examples
rparallel-processingscope

Passing an entire package to a snow cluster


I'm trying to parallelize (using snow::parLapply) some code that depends on a package (ie, a package other than snow). Objects referenced in the function called by parLapply must be explicitly passed to the cluster using clusterExport. Is there any way to pass an entire package to the cluster rather than having to explicitly name every function (including a package's internal functions called by user functions!) in clusterExport?


Solution

  • Install the package on all nodes, and have your code call library(thePackageYouUse) on all nodes via one the available commands, egg something like

     clusterApply(cl, library(thePackageYouUse))
    

    I think the parallel package which comes with recent R releases has examples -- see for example here from help(clusterApply) where the boot package is loaded everywhere:

     ## A bootstrapping example, which can be done in many ways:
     clusterEvalQ(cl, {
       ## set up each worker.  Could also use clusterExport()
       library(boot)
       cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
       cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
       NULL
     })