Snakemake rules in standardized workflows run Python scripts using the script
directive, such as this template rule:
rule XXXXX:
input:
...,
output:
....,
params:
...,
conda:
"../envs/python.yaml"
script:
"../scripts/XXXX.py"
Then in the script, it is possible to use snakemake
object. However, the script is then tightly coupled with that rule, which seems a big disadvantage.
Why is this approach preferred to the approach using shell that calls the script, such as in this rule?
rule XXXXX:
input:
...,
output:
....,
params:
absolute_script_path = ..., # get
argument1 = ...,
conda:
"../envs/python.yaml"
shell:
"python {params.absolute_script_path} {input} {params.argument1} > {output}"
In this approach, python script is decoupled from the Snakemake rule. Also it looks more cohesive, as called arguments are clear from the rule, not hidden in the script. I am only starting with writing Snakemake workflows, so I am just a beginner. I do not understand why the first approach is preferred (or used in standardized Snakemake workflows) to the second approach? Am I missing something? Are there some problems with the second approach? Thank you very much for answers!
The script
approach is a bit more flexible in terms of the objects that the script can access via the params
and other directives.
If you follow the shell
approach you might find it cumbersome to (re) define the argparse
or other approaches to properly take account of the arguments passed via shell
. It's going to be mostly boilerplate, but can get somewhat tedious.
The notebook
directive might be useful in scenarios that require interactive reproduction/development.
All in all, there are no hard rules, and for a given workflow one approach might be more suitable/convenient than other approaches.