I have an R script myscript.R
that uses a configuration file, e.g. config.xml
, what is the best way to submit such a script to a job scheduler (e.g., using qsub)?
I would like to be able to use the script and file in the same way that I would use, e.g., a C or Fortran executable, which is embedded in a bash script.
Here is an example of the approach that I use with a compiled Fortran executable fex
like the following that I will call fscript.sh
:
!#/bin/bash/
mpirun [arguments] "fex" -f $1
The above fscript.sh
can be sent to a cluster with instructions to read the config file like this:
qsub [arguments] fscript.sh 1 config.xml
To run R in an analogous way, I am using a bash script rscript.sh
#!/bin/bash
CONFIG=$1
env $CONFIG R --vanilla < myscript.R
This can be run at the command line, e.g.
qsub [arguments] rscript.sh config.xml
Where the rscript.R
contains something like
library(XML)
config <- Sys.getenv("CONFIG")
config <- xmlList(xmlParse(config.xml))
myfunction(config)
In addition to coming up with the bash script rscript.sh
described above, I have read through tutorials and some documentation for Rscript
and compiler, but it is not clear to me if these are the contexts in which one would be preferred over the other. Also, it is not clear the best way to pass a configuration file in either context.
This questions is related to others, e.g., What are the ways to create an executable from R program, Does an R compiler exist?. However, I do not think that is essential to use compiled code.
What does compiler
have to do with anything? It compiles R code into byte-code for the R interpreter so it may not do what you suspect.
For scripting, use Rscript
(available everywhere), or littler (which predates Rscript).
We actally wrote littler explicitly for this scripting purpose and my "Intro to HPC with R" talks (see the presentations page) actually have examples of submitting such script to the slurm scheduler / resource managers (as I never had access to qsub
).
There are many other questions here relating to Rscript
and command-line parsing. That should get you started.