Search code examples
snakemake

Snakemake: variable that defines whether process is submitted cluster job or the snakefile


My current architecture is that at the start of my Snakefile I have a long running function somefunc which helps decide the "input" to rule all. I realized when I was running the workflow with slurm that somefunc is being executed by each job. Is there some variable I can access that defines whether the code is a submitted job or whether it is the main process:

if not snakemake.submitted_job:
    config['layout'] = somefunc()

...

Solution

  • As discussed with @dariober it seems the cleanest to check whether the (hidden) snakemake directory has locks since they seem not to be generated until the first rule starts (assuming you are not using the --nolock argument).

    import os
    locked = len(os.listdir(".snakemake/locks")) > 0
    

    However this results in a problem in my case:

    import time
    import os
    
    
    def longfunc():
        time.sleep(10)
        return range(5)
    
    locked = len(os.listdir(".snakemake/locks")) > 0
    if not locked:
        info = longfunc()
    
    
    rule all:
        input:
            expand("test_{sample}", sample=info)
    
    
    
    rule test:
        output:
            touch("test_{sample}")
        run:
            """
            sleep 1
            """
    

    Somehow snakemake lets each rule reinterpret the complete snakefile, with the issue that all the jobs will complain that 'info is not defined'. For me it was easiest to store the results and load them for each job (pickle.dump and pickle.load).