Search code examples
rforeachparallel-processingmpihpc

run Rmpi on cluster, specify library path


I'm trying to run an analysis in parallel on our computing cluster.
Unfortunately I've had to set up Rmpi myself and may not have done so properly. Because I had to install all necessary packages into my home folder, I always have to call

.libPaths('/home/myfolder/Rlib');

before I can load packages.

However, it appears that doMPI attempts to load itself, before I can set the library path.

.libPaths('/home/myfolder/Rlib');
cat("Step 1")
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
cat("Step 2")
Children_mcmc1 = foreach(i=1:2) %dopar% {
    cat("Step 3")
    .libPaths('/home/myfolder/Rlib');
    library(MCMCglmm)
    cat("Step 4")
    load("krmh_married.rdata")
    nitt = 1000; thin = 50; burnin = 100
    MCMCglmm( children ~ paternalage.factor ,
        random=~idParents,
        family="poisson", 
        data=krmh_married, 
        pr = F, saveX = T, saveZ = T,
        nitt=nitt,thin=thin,burnin=burnin)
}
closeCluster(cl)
mpi.quit()

If I do

mpirun -H localhost -n 3 R --slave -f "3 - krmh mcmcglmm scc test 2.r" 

I get (after removing some boilerplate messages)

During startup - Warning message:
Step 1
Step 1
Step 1
Step 2Error in { : task 2 failed - "cannot open the connection"
Calls: %dopar% ->
Execution halted

If I do

R --slave -f "3 - krmh mcmcglmm scc test 2.r" 

I get

Step 1
Error in library(doMPI) : there is no package called 'doMPI'
Calls: local ... eval -> suppressMessages -> withCallingHandlers -> library
Execution halted
Error in library(doMPI) : there is no package called 'doMPI'
Calls: local ... eval -> suppressMessages -> withCallingHandlers -> library
Execution halted

I've tried installing doMPI on the run, but even though Step 2 isn't printed, it seems as if the error results from the loop.

And of course, with all this I'm still testing on our frontend, I haven't it made it to submitting the job to the intended cluster yet.

I tried to specify the .libPaths call in my .Rprofile, but I'm not sure this would get read on the cluster and I can't even get it to get read on the frontend (and I couldn't figure out where R is looking for the file).


Solution

  • It's much easier to install R packages into a "personal library", since it is used automatically so you don't have to call .libPaths in your scripts. You can determine what directory this is by executing:

    > Sys.getenv('R_LIBS_USER')
    

    This will automatically be the first directory returned by .libPaths if it exists, so you don't have to worry about calling .libPaths at all.

    Note that there's no point in calling .libPaths in the body of the foreach loop since doMPI must be loaded by the cluster workers before they can execute any tasks.

    I'm not sure what's going wrong in your "mpirun" case, because mpirun is starting all of the workers, so the first four lines of your script are executed by all of them. That is why "Step 1" is displayed three times. But in your second case, the cluster workers are being spawned, so the doMPI package is loaded by the RMPIworker.R script, resulting in the error loading doMPI.

    I suggest that you use the mpirun approach to solve the .libPaths problem, but call startMPIcluster with the verbose=TRUE option. That will create some files in your working directory named "MPI_*.log" which may contain some useful error messages that will provide a clue to the problem.