Search code examples
rparallel-processinghpcqsubtorque

Submit jobs to a slave node from within an R script?


To get myscript.R to run on a cluster slave node using a job scheduler (specifically, PBS)

Currently, I submit an R script to a slave node using the following command

qsub -S /bin/bash -p -1 -cwd -pe mpich 1 -j y -o output.log ./myscript.R

Are there functions in R that would allow me to run myscript.R on the head node and send individual tasks to the slave nodes? Something like:

foreach(i=c('file1.csv', 'file2.csv', pbsoptions = list()) %do% read.csv(i)

Update: alternative solution to the qsub command is to remove #/usr/bin/Rscript from the first line of myscript.R and call it directly, as pointed out by @Josh

qsub -S /usr/bin/Rscript -p -1 -cwd -pe mpich 1 -j y -o output.log myscript.R

Solution

  • If you want to submit jobs from within an R script, I suggest that you look at the "BatchJobs" package. Here is a quote from the DESCRIPTION file:

    Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine.

    BatchJobs appears to be more sophisticated than previous, similar packages, such as Rsge and Rlsf. There are functions for registering, submitting, and retrieving the results of jobs. Here's a simple example:

    library(BatchJobs)
    reg <- makeRegistry(id='test')
    batchMap(reg, sqrt, x=1:10)
    submitJobs(reg)
    y <- loadResults(reg)
    

    You need to configure BatchJobs to use your batch queueing system. The submitJobs "resource" argument can be used to request appropriate resources for the jobs.

    This approach is very useful if your cluster doesn't allow very long running jobs, or if it severely restricts the number of long running jobs. BatchJobs allows you to get around those restrictions by breaking up your work into multiple jobs while hiding most of the work associated with doing that manually.

    Documentation and examples are available at the project website.