Search code examples
rbioinformaticssnakemake

Modifying R script to use command line arguments for use in Snakemake


I wrote this little R script to produce plots of DNA sequence coverage data, where it took as input all the files in a directory.

coverage.files<-list.files("~/coverage_plotting", full.names = TRUE, pattern = ".txt")
coverage.names<-list.files("~/coverage_plotting", full.names = F, pattern=".txt")
pdf.files <- gsub("txt","pdf", coverage.file)
plot.colors <- c("red","blue","green","yellow","purple")
for(i in 1:length(coverage.name)) {
  coverage <- read.delim(coverage.file[i])
  pdf(pdf.files[i], width = 5, height= 4)
  colnames(coverage) <- c("contig", "position", "coverage")
  contigs <- unique(coverage[,1])
  plot(-100,-100, xlim=c(0,800), ylim=c(0,500000), xlab="Coverage", ylab="Number of basepairs")
  for(j in contigs) {
    contig.cov <- subset(coverage,contig==j)
    cov.hist <- hist(contig.cov$coverage, breaks=seq(0,5000, by = 2), plot=F)
    points(cov.hist$mids, cov.hist$counts, type="p", col=plot.colors[j], pch=19, cex=0.5)
  }
  dev.off()
}

I now want to include the script in a Snakemake file so wanted to change it to take a single file as input from the command line. I found commandArgs() and tried to use that, also getting rid of the first loop because only a single file is being input at once now. I ended up with something that looks like this

coverage.file <- commandArgs()
pdf.file <- gsub("txt","pdf", coverage.file)
plot.colors <- c("red","blue","green","yellow","purple")
coverage <- read.delim(coverage.file)
pdf(pdf.file, width = 5, height= 4)
colnames(coverage) <- c("contig", "position", "coverage")
contigs <- unique(coverage[,1])
plot(-100,-100, xlim=c(0,800), ylim=c(0,500000), xlab="Coverage", ylab="Number of basepairs")
  for(j in contigs) {
    contig.cov <- subset(coverage,contig==j)
    cov.hist <- hist(contig.cov$coverage, breaks=seq(0,5000, by = 2), plot=F)
    points(cov.hist$mids, cov.hist$counts, type="p", col=plot.colors[j], pch=19, cex=0.5)
  }
  dev.off()

When I run it, I get the following error,

Error in file(file, "rt") : cannot open the connection
Calls: read.delim -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'coverage.file': No such file or directory
Execution halted

Does anyone have any advice on how I should modify this, to take a single input from the command line?

Thanks


Solution

  • The R doc states about commandArgs():

    Value

    A character vector containing the name of the executable and the user-supplied command line arguments. The first element is the name of the executable by which R was invoked. The exact form of this element is platform dependent: it may be the fully qualified name, or simply the last component (or basename) of the application, or for an embedded R it can be anything the programmer supplied.If trailingOnly = TRUE, a character vector of those arguments (if any) supplied after --args.

    see https://www.rdocumentation.org/packages/base/versions/3.0.3/topics/commandArgs

    So your object coverage.file is a vector and you should access the arguments by specifying a position in the vector. ex:

    args <- commandArgs(trailingOnly=TRUE)
    # access i'th argument depending how you write you shell command in the snakemake. ex:
    coverage.file <- args[1]
    ...