Search code examples
snakemake

Python / Snakemake: check condition before executing


Sorry for asking (please skip to bold letters for TLDR), but I have tried SO many search terms (10+) which all have led to completely unrelated topics for some reason. For example "snakemake execute code once at the beginning" leads to "how can I execute only a specific rule".

What I want to know is really simple: how can I execute code ONCE (no matter if being on a cluster or not, because I have read in the FAQ that global variables get executed multiple times in a cluster environment) to ... just check something? That's all. For example: check if /etc/myfolder is empty. If empty, don't run the whole pipeline / exit the whole Snakefile. Right at the beginning. I have sifted through the official tutorial and didn't find this part.

I'm aware that probably I could write that code just anywhere, because it gets executed before the rules, even if I place it at the bottom. But this probably doesn't work on a cluster (FAQ). I need the code to be executed really only once. For the whole execution of the Snakefile. Also what do I do, if my check fails? Just sys.exit()? I'm looking for an example, but I cannot find anything.


Solution

  • You may want the onstart handler to execute something once at the start. Alternatively you can put code above any of your rules to execute. As you point out, this will run on each submitted job in cluster execution. Something like creating a directory and allowing it to already exist is fine in either case. Generating a new temp folder would be ok in onstart but may cause issues in 'bare' python code.

    The other option, if you want to conditionally execute code, is to update your rule all to use an input function to check a condition and update the required files dynamically:

    def all_input(wildcards):
        outputs = 'list of files you normally want'.split()
        if condition:
            outputs += 'more files if condition is true'.split()
        return outputs
    rule all:
        input: all_input
    

    This shouldn't affect job submissions unless you have other side effects, like creating temp files.