I wonder, I have two rules for practically the same. Is there way how to merge them in Snakemake into a single rule?
rule labels:
input: lambda wildcards: get_predict_data(datadir, wildcards.dataset, wildcards.day)
output: dir / 'data' / '{dataset}' / '{day}' / 'labels.gz'
shell: "zcat {input} | cut -f1 -d '|' | gzip > {output}"
rule labels2:
input: lambda wildcards: datadir / config['test_sets'][wildcards.testset]['path']
output: dir / 'data' / '{testset}' / 'labels.gz'
shell: "zcat {input} | cut -f1 -d '|' | gzip > {output}"
I tried, but there could not be functions, optional wildcards in the output as it seems.
Since rule labels
has two wildcards, rule label2
has only one wildcard, it's hard to completely merge them into a single rule. But maybe you can try:
use rule labels as labels2 with:
input: lambda wildcards: datadir / config['test_sets'][wildcards.testset]['path']
output: dir / 'data' / '{testset}' / 'labels.gz'
This will still result in two rules, but will reduce some of the duplication of code.
Or you can let them have same wildcards. It's not very elegant, but it's very convenient.
rule all:
input:
"result/20241113/dataset1/output",
"result/test/testdataset/output",
rule get_input:
output:
"result/20241113/dataset1/input",
"test/testdataset/input",
shell:
"touch {output}"
def get_label_input(wc):
if wc.day == "test":
return f"test/{wc.dataset}/input"
return f"result/{wc.day}/{wc.dataset}/input"
rule labels:
input: get_label_input
output:
"result/{day}/{dataset}/output"
shell:
"cat {input} > {output}"