Search code examples
pythonsplitsnakemake

why cant i do split with variable names in snakemake rule


See below, I fail to understand why my parameters par1 and par2 are the same in spite of the .split() function.

See this self contained example which is runnable.

Your will need to do "touch id1-one.input id2-two.input" in the working directory.

files=["id1-one", "id2-two"]

rule all:
    input:
        expand("{sample}.output",sample=files)
        
rule myrule:
    params:
        par1 = "{sample}",
        par2 = "{sample}".split("-")
    input:
        i = "{sample}.input"
    output:
        o = "{sample}.output"
    shell:
        "./myprog -i ${input.i} -o {output.o} par1: {params.par1} par2: {params.par2}"

Output from running is:

$ snakemake -s small3.smk --cores 10 -n -p
Building DAG of jobs...
Job stats:
job       count    min threads    max threads
------  -------  -------------  -------------
all           1              1              1
myrule        2              1              1
total         3              1              1


[Sat Dec 11 18:59:02 2021]
rule myrule:
    input: id2-two.input
    output: id2-two.output
    jobid: 2
    wildcards: sample=id2-two
    resources: tmpdir=/var/folders/jb/b9y_67gx3v727w68k7mgrpdm0000gn/T

./myprog -i $id2-two.input -o id2-two.output par1: id2-two par2: id2-two

[Sat Dec 11 18:59:02 2021]
rule myrule:
    input: id1-one.input
    output: id1-one.output
    jobid: 1
    wildcards: sample=id1-one
    resources: tmpdir=/var/folders/jb/b9y_67gx3v727w68k7mgrpdm0000gn/T

./myprog -i $id1-one.input -o id1-one.output par1: id1-one par2: id1-one

[Sat Dec 11 18:59:02 2021]
localrule all:
    input: id1-one.output, id2-two.output
    jobid: 0
    resources: tmpdir=/var/folders/jb/b9y_67gx3v727w68k7mgrpdm0000gn/T

Job stats:
job       count    min threads    max threads
------  -------  -------------  -------------
all           1              1              1
myrule        2              1              1
total         3              1              1

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Solution

  • You should use an input function inside params to get what you want:

    rule myrule:
       params:
          params3 = lambda wildcards: wildcards.sample.split("-")
       ...
       shell:
            "par1: {params.par1} par2: {params.par2} par3: {params.par3} par3[0]: {params.par3[0]}"
    

    Expands to, for id1-one:

    par1: id1-one par2: id1-one par3: id1 one par3[0]: id1