Search code examples
snakemake

Running the same workflow many times with different configs without putting variables in the path


I have an analysis pipeline that I want to run many times with different config files and I do not want to be able to infer the differences between the different configs from the file path as this will become extremely long (in the real example I will iterate over many different variables).

If I have a config config.yaml

workdir: /path/to/workdir
variable: m
slice: 0

and an analysis workflow analysis.smk

configfile: "config.yaml"
workdir: config['workdir']


rule all:
    input:
        expand("{sample}.txt", sample=[0, 1])


rule run:
    output:
        "{sample}.txt"
    shell:
        f"echo {config['variable']} {config['slice']} {{wildcards.sample}} > {{output}}"

I want to have a Snakefile where I differentiate between runs by specifying a unique tag to use as the working directory

from itertools import product
configfile: "config.yaml"

# Vars that would be loaded from some other config file
experiment_dir = "/path/to/workdir/test"
tag = "run"
features = ['m', 'm_t21']
slices = [0, 1, 2]

all_outputs = []

for i, (feature, slc) in enumerate(product(features, slices)):
    run_tag = f'{tag}_{i}'
    # Update the config with the feature
    config["feature"] = feature
    config["slice"] = slc
    # Specify a unique run tag
    config["workdir"] = f'{experiment_dir}/{run_tag}'
    # Include the analysis pipeline
    module analysis_pipeline:
        snakefile:
            "analysis.smk"
        config: config

    # Prefix the imported rules with the feature (can't dynamically assign name)
    use rule * from analysis_pipeline as f'{run_tag}'*
    all_outputs += [f'{config["output_dir"]}/{output}' for output in rules.run_all.input]


rule all:
    input:
        *all_outputs,
    default_target: True

where I would have many more variables to iterate over like with slices and features. The problem is that I can't dynamically assign rule names. How should I structure my workflows to get the functionality that I want?


Solution

  • It is possible to dynamically assign rule names: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html (see the name directive).

    However, in this case, the much more canonical approach would be to add a wildcard for each variable of interest to the rules that you want to apply for each sample, instead of reimporting the module every time. Also note that Snakemake has a special helper for exploring parameter spaces like this (https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#parameter-space-exploration).