With snakemake, I'm trying to apply the same rule to N independent files and to a file which is the merge of all those N files.
I've created a minimal example you can find below. That does this:
I have a bunch of files as my initial input, their paths are given in a configuration file I have little control over.
First I am extracting a specific part of those files (rule create_list
) which I am processing (rule do_stuff_on_list
).
This works just fine, what I'm trying to do and have trouble doing is merge all the "lists" together (rule merge_lists
) and apply to that the exact same processing (rule do_stuff_on_list
).
config_file = {
"result_files": [
{
"id": 0,
"path": "/path/to/readonly/location/1.txt"
},
{
"id": 8,
"path": "/path/to/readonly/location/2.txt"
},
{
"id": 4,
"path": "/path/to/readonly/location/3.txt"
}
]
}
SAMPLES = {str(x["id"]): x["path"] for x in config_file["result_files"]}
rule all:
input:
"AAA_finalResult.txt"
rule create_list:
input:
sample_path = lambda wildcards: SAMPLES[wildcards.sample]
output:
"{sample}_mut_list.json"
shell:
"touch {output}"
rule merge_lists:
input:
expand(rules.create_list.output, sample=SAMPLES.keys())
output:
"merged_mut_list.json"
shell:
"touch {output}"
rule do_stuff_on_list:
input:
rules.create_list.output
output:
"{sample}_stuff.json"
shell:
"touch {output}"
rule merge_all_results:
input:
expand(rules.do_stuff_on_list.output, sample=SAMPLES.keys()),
output:
"AAA_finalResult.txt"
shell:
"touch {output}"
I know I could definitely solve that issue by creating a second rule identical to do_stuff_on_list
that takes as input the merge. But I feel like there should be a better way but I cannot figure it out...
Is there a way to do that kind of stuff ?
Rule inheritance might solve the problem for you. Roughly:
use rule do_stuff_on_list as do_stuff_on_merged_list with:
input: rules.merge_all_results.output,
output: "{sample}_merged_stuff.json",
Note that as reflected in the documentation, rule inheritance can be used to modify any part of the rule except the actual execution step (in your example shell
).