Search code examples
rbashtranslate

Translating variables from bash to R


I need to use exomedepth. This requires an Rscript.

However I have been running this bash script previously (it goes into the bestcoverage_E036 file which contains the list of file name IDs, and retrieves the ID corresponding the job array and the line number) - it works great for bash scripts.

#!/bin/bash --login
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --mail-type=ALL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --array=1-64

module load parallel
module load tool

EXOME_IDs_FILE=/home/bestcoverage_E036
INPUTFILE=/home/{}.bam

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "tool $INPUTFILE"

However, I now need to use R for exomedepth, the documentation shows some of its use as :

data(exons.hg19)
my.counts <- getBamCounts(bed.frame = exons.hg19,
                          bam.files = my.bam,
                          include.chr = FALSE,
                          referenceFasta = fasta)

I would like to use my variables from bash in these examples, such as so my.bam would be the $INPUTFILE

this obviously doesn't work but the idea is something like this:

#!/bin/bash --login
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --mail-type=ALL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --array=1-64

module load parallel
module load tool

EXOME_IDs_FILE=/home/bestcoverage_E036
INPUTFILE=/home/{}.bam
HG38=/home/hg38.fasta
INPUTBEDFILE=/home/inputbed.bed

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "data($INPUTBEDFILE)
                                                                 my.counts <- getBamCounts(bed.frame = $INPUTBEDFILE,
                                                                 bam.files = $INPUTFILE,
                                                                 include.chr = FALSE,
                                                                 referenceFasta = $HG38)

Does anyone know how to use bash variables in R code?


Solution

  • You need a way to pass command line arguments to your Rscript there are a couple of libraries helping you with that and a very rudimentary base R function (commandArgs).

    With the latter you have to do a lot of parsing and sense checking yourself, while libraries like getopt help you a lot with common tasks.

    Having said that here's an example using base R:

    cli.R

    ## base R very simple but a lot of manual parsing
    
    base_args <- commandArgs(TRUE)
    
    run_rnorm <- function(n, mean = NA, sd = NA) {
       `%!%` <- function(x, y) if (is.na(as.numeric(x))) y else as.numeric(x)
       args <- list(n = NULL,
                    mean = NULL,
                    sd = NULL)
       args$n <- as.numeric(n)
       args$mean <- mean %!% NULL
       args$sd <- sd %!% NULL
       do.call(rnorm, args)
    }
    
    stopifnot(`At least one parameter is needed` = length(base_args) > 0)
    run_rnorm(base_args[1], base_args[2], base_args[3])
    

    The you can call it from the bash like this:

    Rscript cli.R 3
    

    Thus, you have now a possibility to pass (bash) variables from a script like this

    Rscript cli.r $myvariable
    

    and in cli.R you can access it via commandArgs(TRUE)[1]. I do not know about parallel, so you have check how to puzzle this together.