Search code examples
loopssnakemakecheckpoint

Execute Snakemake rule repeatedly until certain conditions are met


I'd like to use Snakemake for a flow which requires that a certain step be repeatedly performed until certain conditions are met. It's impossible to determine in advance how many times the step will be needed. It could be 1 or 6 or any other number.

My gut feeling is this is something Snakemake cannot do because Directed Acyclic Graph and all...

I was hoping that a checkpoint might be helpful, though, because it triggers a reevaluation of the DAG, but I just can't understand exactly how it works.

Is a loop in the Snakefile possible?

Thanks!


Adding some commentary on what's actually happening in the excellent answer below. Hopefully it helps others and myself when I inevitably revisit this question.

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" doesn't exist.  do checkpoint keep_trying with i == 1.     
keep_trying:  output "round_1" doesn't exist.  do run section.  random() decides to touch output[0], which is "round_1".

snakemake reevaluates graph after checkpoint is complete

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" doesn't exist.  do checkpoint keep_trying with i == 2.
keep_trying:   output "round_2" doesn't exist.  do run section.  random() decides to touch output[0], which is "round_2".

snakemake reevaluates graph after checkpoint is complete

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" doesn't exist.  do checkpoint keep_trying with i == 3.
keep_trying:  output "round_3" doesn't exist.  do run section.  random() decides to touch "succes.txt".

snakemake reevaluates graph after checkpoint is complete

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" exists.  return "success.txt" to rule all.
all:  input requirement is "success.txt", which is now satisfied.

Solution

  • You are right that you need checkpoints for this! Here is a little example that does what you want:

    import os
    from pathlib import Path
    
    
    tries = 0
    def all_input(wildcards):
        global tries
        if not os.path.exists("succes.txt"):
            tries += 1
            checkpoints.keep_trying.get(i=tries)
        else:
            return "succes.txt"
    
    
    rule all:
        input:
            all_input
    
    
    checkpoint keep_trying:
        output:
            "round_{i}"
        run:
            import random
            if random.random() > 0.9:
                Path('succes.txt').touch()
            Path(output[0]).touch()
    

    Here we say that rule all needs as input what gets returned from the function all_input. This function checks whether or not the file succes.txt already exists. If it doesn't it will trigger a run of the checkpoint keep trying which might make the succes.txt file (10% chance). If succes.txt actually exists, then that is the input for rule all, and snakemake exits succesfully.