Search code examples
pythonpython-3.xbioinformaticssnakemake

Why Snakemake prefers calling script using script directive instead of calling from shell?


Snakemake rules in standardized workflows run Python scripts using the script directive, such as this template rule:

rule XXXXX:
    input:
        ...,
    output:
        ....,
    params:
        ...,
    conda:
        "../envs/python.yaml"
    script:
        "../scripts/XXXX.py"

Then in the script, it is possible to use snakemake object. However, the script is then tightly coupled with that rule, which seems a big disadvantage.

Why is this approach preferred to the approach using shell that calls the script, such as in this rule?

rule XXXXX:
    input:
        ...,
    output:
        ....,
    params:
        absolute_script_path = ..., # get
        argument1 = ..., 
    conda:
        "../envs/python.yaml"
    shell:
        "python {params.absolute_script_path} {input} {params.argument1} > {output}"

In this approach, python script is decoupled from the Snakemake rule. Also it looks more cohesive, as called arguments are clear from the rule, not hidden in the script. I am only starting with writing Snakemake workflows, so I am just a beginner. I do not understand why the first approach is preferred (or used in standardized Snakemake workflows) to the second approach? Am I missing something? Are there some problems with the second approach? Thank you very much for answers!


Solution

  • The script approach is a bit more flexible in terms of the objects that the script can access via the params and other directives.

    If you follow the shell approach you might find it cumbersome to (re) define the argparse or other approaches to properly take account of the arguments passed via shell. It's going to be mostly boilerplate, but can get somewhat tedious.

    The notebook directive might be useful in scenarios that require interactive reproduction/development.

    All in all, there are no hard rules, and for a given workflow one approach might be more suitable/convenient than other approaches.