I'd like to use Snakemake for a flow which requires that a certain step be repeatedly performed until certain conditions are met. It's impossible to determine in advance how many times the step will be needed. It could be 1 or 6 or any other number.
My gut feeling is this is something Snakemake cannot do because Directed Acyclic Graph and all...
I was hoping that a checkpoint might be helpful, though, because it triggers a reevaluation of the DAG, but I just can't understand exactly how it works.
Is a loop in the Snakefile possible?
Thanks!
Adding some commentary on what's actually happening in the excellent answer below. Hopefully it helps others and myself when I inevitably revisit this question.
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 1.
keep_trying: output "round_1" doesn't exist. do run section. random() decides to touch output[0], which is "round_1".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 2.
keep_trying: output "round_2" doesn't exist. do run section. random() decides to touch output[0], which is "round_2".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 3.
keep_trying: output "round_3" doesn't exist. do run section. random() decides to touch "succes.txt".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" exists. return "success.txt" to rule all.
all: input requirement is "success.txt", which is now satisfied.
You are right that you need checkpoints for this! Here is a little example that does what you want:
import os
from pathlib import Path
tries = 0
def all_input(wildcards):
global tries
if not os.path.exists("succes.txt"):
tries += 1
checkpoints.keep_trying.get(i=tries)
else:
return "succes.txt"
rule all:
input:
all_input
checkpoint keep_trying:
output:
"round_{i}"
run:
import random
if random.random() > 0.9:
Path('succes.txt').touch()
Path(output[0]).touch()
Here we say that rule all
needs as input what gets returned from the function all_input
. This function checks whether or not the file succes.txt
already exists. If it doesn't it will trigger a run of the checkpoint keep trying which might make the succes.txt
file (10% chance). If succes.txt
actually exists, then that is the input for rule all
, and snakemake exits succesfully.