I'm working on a bioinformatics pipeline which must be able to run different rules to produce different outputs based on the contents of an input file:
def foo(file):
'''
Function will read the file contents and output a boolean value based on its contents
'''
# Code to read file here...
return bool
rule check_input:
input: "input.txt"
run:
bool = foo("input.txt")
rule bool_is_True:
input: "input.txt"
output: "out1.txt"
run:
# Some code to generate out1.txt. This rule is supposed to run only if foo("input.txt") is true
rule bool_is_False:
input: "input.txt"
output: "out2.txt"
run:
# Some code to generate out2.txt. This rule is supposed to run only if foo("input.txt") is False
How do I write my rules to handle this situation? Also how do I write my first rule all if the output files are unknown before the rule check_input is executed?
Thanks!
You're right, snakemake has to know which files to produce before executing the rules. Therefore, I suggest you use a function which reads what you called "the input file" and define the output of the workflow accordingly.
ex:
def getTargetsFromInput():
targets = list()
## read file and add target files to targets
return targets
rule all:
input: getTargetsFromInput()
...
You can define the path of the input file with --config
argument on the snakemake command line or directly use some sort of structured input file (yaml, json) and use the keyword configfile:
in the Snakefile: https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html