I'm using some quick R script to cbind files after performing quantification (kallisto/salmon).
The problem is, I get an R error, saying my input files aren't the same length so cbind() won't work.
This is obviously not the case, they all are 16887 lines (checked with bash wc), and it works perfectly fine in R without snakemake.
Also worth mentioning, I do get an output for a random number of sample (~ 1 to 4).
Here's the R code :
sample <- read.table(snakemake@input[[1]], sep = "\t", header = TRUE)
if (!file.exists(snakemake@params[[1]])) {
merge <- data.frame(sample)
} else {
merge1 <- read.table(snakemake@params[[1]], sep = "\t", header = TRUE)
merge <- cbind(merge1, sample[,2, drop=F])
}
write.table(merge, snakemake@params[[1]], sep = "\t", quote = F, row.names = F)
file.create(snakemake@output[[1]])
And the snakemake rule :
rule merge_quantif:
input:
QUANTIF+"/{sample}/quantif.txt"
output:
QUANTIF+"/{sample}/merge.done"
params:
QUANTIF+"/all_sample_quantified.txt"
script:
"Tools/merge_quantif.R"
My files are like this :
Gene R1
A1BG 0.287571
A1CF 0
A2M 0.198756
A2ML1 0
A2MP1 0
A3GALT2 0
A4GALT 3.098108
And the output should be like this, but with all 17 samples
Gene R4 R8 R15 R13
A1BG 0.337515 0.284943 0.488654 0.587114
A1CF 0 0 0 0
A2M 0 0 0.105159 0.009539
If anyone has an idea to sort this out, will be gladly appreciated.
------------ EDIT Modified R code to go with the asnwer:
sample <- snakemake@input
merge <- read.table(sample[[1]], sep = "\t", header = T)
for (i in 2:length(sample)) {
x <- read.table(sample[[i]], sep = "\t", header = T)
merge <- merge(merge, x, by="Gene")
}
write.table(merge, snakemake@output[[1]], sep = "\t", quote = F, row.names = F)
The problem, I think, is that rule merge_quantif
is executed for each sample, i.e. 17 times, possibily in parallel. However, each run of merge_quantif
writes to the same output file (QUANTIF+"/all_sample_quantified.txt"
) since in your R script you have write.table(merge, snakemake@params[[1]], ...)
. I suspect this causes problems or at least is not an ideal setup. I suspect what you want is something like:
rule merge_quantif:
input:
expand(QUANTIF+"/{sample}/quantif.txt", sample= list_of_samples)
output:
QUANTIF+"/all_sample_quantified.txt"
script:
"Tools/merge_quantif.R"
Where Tools/merge_quantif.R
reads the list of input files one by one, merges them and finally writes the merged file to QUANTIF+"/all_sample_quantified.txt"