Search code examples
workflowsnakemake

What is the practical difference between a sub-workflow and the includes directive? [Snakemake]


In the Snakemake documentation, the includes directive can incorporate all of the rules of another workflow into the main workflow and apparently can show up in snakemake --dag -n | dot -Tsvg > dag.svg. Sub-workflows, on the other hand, can be executed prior to the main workflow should you develop rules which depend on their output.

My question is: how are these two really different? Right now, I am working on a workflow, and it seems like I can get by on just using includes and putting the name of the output in rule all of the main workflow. I could probably even place the output in the input of a main-workflow rule, making the includes workflow execute prior to that rule. Additionally, I can't visualize a DAG which includes the sub-workflow, for whatever reason. What do sub-workflows offer that the includes directive can't do?


Solution

  • The include doesn't "incorporate another workflow". It just adds the rules from another file, like if you add them with copy/paste (with a minor difference that include doesn't affect your target rule). The subworkflow has an isolated set of rules that work together to produce the final target file of this subworkflow. So it is well structured and isolated from both main workflow and other subworkflows.

    Anyway, my personal experience shows that there are some bugs in Snakemake that make using subworkflows quite difficult. Including the file is pretty straightforward and easy.