In the Snakemake documentation, the includes
directive can incorporate all of the rules of another workflow into the main workflow and apparently can show up in snakemake --dag -n | dot -Tsvg > dag.svg
. Sub-workflows, on the other hand, can be executed prior to the main workflow should you develop rules which depend on their output.
My question is: how are these two really different? Right now, I am working on a workflow, and it seems like I can get by on just using includes
and putting the name of the output in rule all
of the main workflow. I could probably even place the output in the input
of a main-workflow rule, making the includes
workflow execute prior to that rule. Additionally, I can't visualize a DAG which includes the sub-workflow, for whatever reason. What do sub-workflows offer that the includes
directive can't do?
The include
doesn't "incorporate another workflow". It just adds the rules from another file, like if you add them with copy/paste (with a minor difference that include
doesn't affect your target rule). The subworkflow
has an isolated set of rules that work together to produce the final target file of this subworkflow. So it is well structured and isolated from both main workflow and other subworkflows.
Anyway, my personal experience shows that there are some bugs in Snakemake that make using subworkflows quite difficult. Including the file is pretty straightforward and easy.