Search code examples
pythonparametersworkflowwildcardsnakemake

Snakemake with one input but multiple parameters permutations


I have been trying to wrap my head around this problem which probably has a very easy solution. I am running a bioinformatics workflow where I have one file as input and I want to run a program on it. However I want that program to be run with multiple parameters. Let me explain.

I have file.fastq and I want to run cutadapt (in the shell) with two flags: --trim and -e. I want to run trim with values --trim 0 and --trim 5. Also I want -e with values -e 0.1 and -e 0.5

Thererfore I want to run the following:
cutadapt file.fastq --trim0 -e0.5 --output ./outputs/trim0_error0.5/trimmed_file.fastq
cutadapt file.fastq --trim5 -e0.5 --output ./outputs/trim5_error0.5/trimmed_file.fastq
cutadapt file.fastq --trim0 -e0.1 --output ./outputs/trim0_error0.1/trimmed_file.fastq
cutadapt file.fastq --trim5 -e0.1 --output ./outputs/trim5_error0.1/trimmed_file.fastq

I thought snakemake would be perfect for this. So far I tried:

E = [0.1, 0.5]
TRIM = [5, 0]

rule cutadapt:
    input:
        "file.fastq"

    output:
        expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
    
    params:
        trim = TRIM,
        e = E

    shell:
        "cutadapt {input} -e{params.e} --trim{params.trim} --output {output}"

However I get an error like this:

shell:
cutadapt file.fastq -e0.1 0.5 --trim0 5 --output {output}
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

So, as you can see, snakemake is not taking each argument of the TRIM and E variables, but putting them together like a string. How could I solve this problem? Thank you in advance


Solution

  • When specifying params, right now you are providing full lists rather than specific values. Contrast the following parameter values:

    E = [0.1, 0.5]
    TRIM = [5, 0]
    
    
    rule all
        input: expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
    
    rule cutadapt:
        input:
            "file.fastq"
        output: "../outputs/trim{TRIM}_error{E}/trimmed_file.fastq"
        params:
           trim_list = TRIM,
           trim_value = lambda wildcards: wildcards.TRIM,
        shell:
           "cutadapt {input} -e{wildcards.E} --trim{wildcards.TRIM} --output {output}"
    

    Note that in the shell directive there was no need to reference params, since this directive is aware of wildcards.