I'm trying to parallelize (using snow::parLapply
) some code that depends on a package (ie, a package other than snow
). Objects referenced in the function called by parLapply
must be explicitly passed to the cluster using clusterExport
. Is there any way to pass an entire package to the cluster rather than having to explicitly name every function (including a package's internal functions called by user functions!) in clusterExport
?
Install the package on all nodes, and have your code call library(thePackageYouUse)
on all nodes via one the available commands, egg something like
clusterApply(cl, library(thePackageYouUse))
I think the parallel
package which comes with recent R releases has examples -- see for example here from help(clusterApply)
where the boot
package is loaded everywhere:
## A bootstrapping example, which can be done in many ways:
clusterEvalQ(cl, {
## set up each worker. Could also use clusterExport()
library(boot)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
NULL
})