I have an analysis pipeline that I want to run many times with different config files and I do not want to be able to infer the differences between the different configs from the file path as this will become extremely long (in the real example I will iterate over many different variables).
If I have a config config.yaml
workdir: /path/to/workdir
variable: m
slice: 0
and an analysis workflow analysis.smk
configfile: "config.yaml"
workdir: config['workdir']
rule all:
input:
expand("{sample}.txt", sample=[0, 1])
rule run:
output:
"{sample}.txt"
shell:
f"echo {config['variable']} {config['slice']} {{wildcards.sample}} > {{output}}"
I want to have a Snakefile
where I differentiate between runs by specifying a unique tag to use as the working directory
from itertools import product
configfile: "config.yaml"
# Vars that would be loaded from some other config file
experiment_dir = "/path/to/workdir/test"
tag = "run"
features = ['m', 'm_t21']
slices = [0, 1, 2]
all_outputs = []
for i, (feature, slc) in enumerate(product(features, slices)):
run_tag = f'{tag}_{i}'
# Update the config with the feature
config["feature"] = feature
config["slice"] = slc
# Specify a unique run tag
config["workdir"] = f'{experiment_dir}/{run_tag}'
# Include the analysis pipeline
module analysis_pipeline:
snakefile:
"analysis.smk"
config: config
# Prefix the imported rules with the feature (can't dynamically assign name)
use rule * from analysis_pipeline as f'{run_tag}'*
all_outputs += [f'{config["output_dir"]}/{output}' for output in rules.run_all.input]
rule all:
input:
*all_outputs,
default_target: True
where I would have many more variables to iterate over like with slices
and features
. The problem is that I can't dynamically assign rule names. How should I structure my workflows to get the functionality that I want?
It is possible to dynamically assign rule names: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html (see the name directive).
However, in this case, the much more canonical approach would be to add a wildcard for each variable of interest to the rules that you want to apply for each sample, instead of reimporting the module every time. Also note that Snakemake has a special helper for exploring parameter spaces like this (https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#parameter-space-exploration).