Search code examples
pythonpython-3.xsnakemakedirected-acyclic-graphs

Use the same rule at multiple locations in a pipeline with and without wildcards


With snakemake, I'm trying to apply the same rule to N independent files and to a file which is the merge of all those N files.

I've created a minimal example you can find below. That does this: dag

I have a bunch of files as my initial input, their paths are given in a configuration file I have little control over.

First I am extracting a specific part of those files (rule create_list) which I am processing (rule do_stuff_on_list).

This works just fine, what I'm trying to do and have trouble doing is merge all the "lists" together (rule merge_lists) and apply to that the exact same processing (rule do_stuff_on_list).

config_file = {
    "result_files": [
        {
            "id": 0,
            "path": "/path/to/readonly/location/1.txt"
        },
        {
            "id": 8,
            "path": "/path/to/readonly/location/2.txt"
        },
        {
            "id": 4,
            "path": "/path/to/readonly/location/3.txt"
        }
    ]
}

SAMPLES = {str(x["id"]): x["path"] for x in config_file["result_files"]}

rule all:
    input:
        "AAA_finalResult.txt"

rule create_list:
    input:
        sample_path = lambda wildcards: SAMPLES[wildcards.sample]
    output:
        "{sample}_mut_list.json"
    shell:
        "touch {output}"

rule merge_lists:
    input:
        expand(rules.create_list.output, sample=SAMPLES.keys())
    output:
        "merged_mut_list.json"
    shell:
        "touch {output}"

rule do_stuff_on_list:
    input:
        rules.create_list.output
    output:
        "{sample}_stuff.json"
    shell:
        "touch {output}"

rule merge_all_results:
    input:
        expand(rules.do_stuff_on_list.output, sample=SAMPLES.keys()),
    output:
        "AAA_finalResult.txt"
    shell:
        "touch {output}"

I know I could definitely solve that issue by creating a second rule identical to do_stuff_on_list that takes as input the merge. But I feel like there should be a better way but I cannot figure it out...

Is there a way to do that kind of stuff ?


Solution

  • Rule inheritance might solve the problem for you. Roughly:

    use rule do_stuff_on_list as do_stuff_on_merged_list with:
        input: rules.merge_all_results.output,
        output: "{sample}_merged_stuff.json",
    

    Note that as reflected in the documentation, rule inheritance can be used to modify any part of the rule except the actual execution step (in your example shell).