Search code examples
snakemake

Does Snakemake have states during a workflow


Does Snakemake support states in the pipelines. Meaning the current run can be changed according to the last e.g.10 runs? For example: Data is being processed and if the current value is greater than X and in the last 10 values there were at least 5 others with a value greater than X, then i want the workflow to branch differently otherwise it should continue normally.


Solution

  • You could potentially use some slightly hacky workaround with checkpoints and multiple snakefiles to achieve conditional execution of rules.

    For example, a first snakemake file that includes a checkpoint, i.e. a rule that waits for the execution of previous rules and only gets evaluated then. Here you could check your conditions from the current pipeline and previous results. For the example code I'm just using a random number to determine what the checkpoint does.

    
    
    rule all:
        input:
            "random_number.txt",
            "next_step.txt"
    
    
    rule random_number:
        output: "random_number.txt"
        run:
            import numpy as np
            r = np.random.choice([0, 1])
            with open(output[0], 'w') as fh:
                fh.write(f"{r}")
    
    
    checkpoint next_rule:
        output: "next_step.txt"
        run:
            # read random number
            with open("random_number.txt", 'r') as rn:
                num = int(rn.read())
                print(num)
    
            if num == 0:
                with open(output[0], 'w') as fh:
                    fh.write("case_a")
    
            elif num == 1:
                with open(output[0],'w') as fh:
                    fh.write("case_b")
            else:
                exit(1)
    
    
    

    Then you could have a second snakefile with a conditional rule all, i.e. a list of output files that depends on the result of the first pipeline.

    
    with open("next_step.txt", 'r') as fh:
        case = fh.read()
    
    outputs = []
    
    if case == "case_a":
        outputs = ["output_a_0.txt", "output_a_1.txt"]
    if case == "case_b":
        outputs = ["output_b_0.txt", "output_b_1.txt"]
    
    rule all:
        input:
            outputs