Search code examples
pythonsnakemake

apply a snakemake rule on a list of values in parallel


I'm currently exploring using snakemake as a workflow tool.

In my specific use case I don't start from a list of files but rather from a list of values that should result in the creation of a list of files.

In my example, I create the files with a small python snippet which works fine but when I want to use those files in parallel in a second rule, they are concatenated into one parameter:


rule all:
    input:
        expand('{file}.bar', file=data)

rule foo:
    output:
        expand('{file}.foo', file=data)
    run:
        for item in data:
            with open(f'{item}.foo', 'w') as fout:
                fout.write('foo')

rule bar:
    input:
        file=expand('{file}.foo', file=data)
    output:
        outfile=expand('{file}.bar', file=data)
    shell:
        """echo {output.outfile};echo bar > {output.outfile} """

the example prints

"one.bar two.bar three.bar"

at once, so the rule is applied only once,

and then raises an error because the expected output files are not created.


Solution

  • Probably you want rule bar without expand's:

    rule bar:
        input:
            file='{file}.foo',
        output:
            outfile='{file}.bar',
        shell:
            """echo {output.outfile};echo bar > {output.outfile} """
    

    this is because you apply this rule to each {file} individually not to all of them at once. You would use expand if, for example, you wanted to concatenate all {file}'s in a single one.

    By the same reasoning, you could also change rule foo to:

    rule foo:
        output:
            '{file}.foo',
        run:
            item=wildcards.file
            with open(f'{item}.foo', 'w') as fout:
                fout.write('foo')
    

    (not tested)