I have been trying to wrap my head around this problem which probably has a very easy solution. I am running a bioinformatics workflow where I have one file as input and I want to run a program on it. However I want that program to be run with multiple parameters. Let me explain.
I have file.fastq
and I want to run cutadapt
(in the shell) with two flags: --trim
and -e
. I want to run trim with values --trim 0
and --trim 5
. Also I want -e
with values -e 0.1
and -e 0.5
Thererfore I want to run the following:
cutadapt file.fastq --trim0 -e0.5 --output ./outputs/trim0_error0.5/trimmed_file.fastq
cutadapt file.fastq --trim5 -e0.5 --output ./outputs/trim5_error0.5/trimmed_file.fastq
cutadapt file.fastq --trim0 -e0.1 --output ./outputs/trim0_error0.1/trimmed_file.fastq
cutadapt file.fastq --trim5 -e0.1 --output ./outputs/trim5_error0.1/trimmed_file.fastq
I thought snakemake would be perfect for this. So far I tried:
E = [0.1, 0.5]
TRIM = [5, 0]
rule cutadapt:
input:
"file.fastq"
output:
expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
params:
trim = TRIM,
e = E
shell:
"cutadapt {input} -e{params.e} --trim{params.trim} --output {output}"
However I get an error like this:
shell:
cutadapt file.fastq -e0.1 0.5 --trim0 5 --output {output}
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
So, as you can see, snakemake is not taking each argument of the TRIM and E variables, but putting them together like a string. How could I solve this problem? Thank you in advance
When specifying params
, right now you are providing full lists rather than specific values. Contrast the following parameter values:
E = [0.1, 0.5]
TRIM = [5, 0]
rule all
input: expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
rule cutadapt:
input:
"file.fastq"
output: "../outputs/trim{TRIM}_error{E}/trimmed_file.fastq"
params:
trim_list = TRIM,
trim_value = lambda wildcards: wildcards.TRIM,
shell:
"cutadapt {input} -e{wildcards.E} --trim{wildcards.TRIM} --output {output}"
Note that in the shell
directive there was no need to reference params
, since this directive is aware of wildcards
.