I keep getting the same errors at the same step in the pipeline. I have 2 rules named typing
and apply_qc
which somehow conflict. typing
uses outputs from another rule, polish_consensus
, and apply_qc
uses the outputs of typing
(so the order: polish_consensus > typing > apply_qc
). The outputs of typing
are a fasta and CSV file. apply_qc
is a quality control step, which will censor the data of these files when of low quality. Now I keep getting the same errors with the rules:
1. The code:
rule typing:
input:
f"{DATA_FOLDER}/vaccines.fasta",
rules.polish_consensus.output
output:
temp(f"{OUTPUT_FOLDER}/typing/{{samplename}}.csv")
script:
"../scripts/typing.py"
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
rules.featurecounts.output.summary
output:
typing=f"{OUTPUT_FOLDER}/typing/{{samplename}}_qcpass.csv",
consensus=f"{OUTPUT_FOLDER}/consensus/{{samplename}}_qcpass.fasta"
script:
"../scripts/apply_qc.py"
The output of the rule polish_consensus
is output/consensus/{samplename}.fasta
with samplename=prrsv12
.
The error:
InputFunctionException in rule typing in file /home/lisah/Pycharm/minor-HTHPC/snakemake/workflow/rules/typing.smk, line 1:
Error:
KeyError: 'prrsv12_qcpass'
Wildcards:
samplename=prrsv12_qcpass
Traceback:
File "/home/lisah/Pycharm/minor-HTHPC/snakemake/workflow/rules/typing.smk", line 12, in <lambda>
The error shows that the wildcard used is prrsv12_qcpass
, but the wildcard is prrsv12
+ prrsv12_qcpass
is the filename of an output of the apply_qc
rule.
AmbiguousRuleException:
Rules apply_qc and typing are ambiguous for the file output/typing/prrsv12_qcpass.csv.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
apply_qc: samplename=prrsv12
typing: samplename=prrsv12_qcpass
Expected input files:
apply_qc: output/typing/prrsv12.csv output/consensus/prrsv12.fasta output/counts/prrsv12_summary.csv
typing: data/prrsv/vaccines.fasta output/consensus/prrsv12_qcpass.fasta
Expected output files:
apply_qc: output/typing/prrsv12_qcpass.csv output/consensus/prrsv12_qcpass.fasta
typing: output/typing/prrsv12_qcpass.csv
As said before, the wildcard for typing
is wrong, but the wildcard for apply_qc
is correct (?????). Likewise, the expected input for typing
is not output/consensus/prrsv12_qcpass.fasta
but output/consensus/prrsv12.fasta
.
I hoped I fixed the AmbiguousRuleException
by using the rules.<rule>.output
syntax and adding a ruleorder. As for the first error, I am completely lost and have no idea why this happens. It seems like an error with the wildcard, but I have no idea how the _qcpass
part is added to the wildcard. It also seems like this error happens at random: Some runs work fine and others it crashes into this (Yes, run with the same data).
EDIT:
I tried running it with the --debug-dag
and the only thing that popped out is the following:
selected job readcap
wildcards: samplename=prrsv20_qcpass
file output/fastq/prrsv20_qcpass_readcap.fastq.gz:
Producer found, hence exceptions are ignored.
candidate job select_centroid
wildcards: samplename=prrsv20_qcpass
candidate job featurecounts
wildcards: samplename=prrsv20_qcpass
candidate job map2ref
wildcards: samplename=prrsv20_qcpass
candidate job apply_qc
wildcards: samplename=prrsv20
selected job apply_qc
wildcards: samplename=prrsv20
The _qcpass
is added to the wildcard for the rest of the pipeline, but seems to work fine for apply_qc
? apply_qc
is one of the last rules in the pipeline...
It sounds like you misunderstood what ambiguous rules mean for snakemake
, why you should avoid them and why ruleorder
will not solve your problem.
First of all, here's a MWE - a minimal working example which reproduces your issue. Note that it is easier for everyone if you provide such an example and the call used to run snakemake
.
In this case, the problem can be reproduced by calling snakemake -call
:
rule polish_consensus:
output:
"consensus/{samplename}.fasta",
shell:
"""
echo polish_consensus > {output[0]}
"""
rule typing:
input:
rules.polish_consensus.output,
output:
"typing/{samplename}.csv",
shell:
"""
cat {input[0]} > {output[0]}
"""
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
output:
typing="typing/{samplename}_qcpass.csv",
consensus="onsensus/{samplename}_qcpass.fasta",
shell:
"""
echo qcpass > {output[0]}
echo qcpass > {output[1]}
"""
rule all:
default_target: True
input:
expand(rules.apply_qc.output[0], samplename="prrsv12"),
expand(rules.apply_qc.output[1], samplename="prrsv12"),
Your wildcard {samplename}
will be matched by snakemake
against all the output-files you request as well as files snakemake
has to generate to run the workflow.
Now requesting typing/prrsv12_qcpass.csv
matches the output of rule apply_qc
with samplename=prrsv12
as well as rule typing
with samplename=prrsv12_qcpass
. To prevent this you should constrain your wildcard rather than trying a ruleorder
or using references to a rules.<name>.output
.
By using a wildcard_constraint
you tell snakemake
which strings a wildcard can match. In your case, your samplename
is presumably never going to contain an underscore, i.e. you can use:
wildcard_constraints:
samplename="[a-zA-Z0-9]+",
to tell snakemake
to match against small/capital letters an numbers from 0-9, but not any whitespace or other symbols like underscore. This will make snakemake
never consider prrsv12_qcpass
as the wildcard value for samplefile
, but only prsv12
as the wildcard and _qcpass
as an additional, fixed part of the filename.
More on wildcard_constraints
can be found in the documentation
Putting everything together into a single Snakefile
:
wildcard_constraints:
samplename="[a-zA-Z0-9]+",
rule polish_consensus:
output:
"consensus/{samplename}.fasta",
shell:
"""
echo polish_consensus > {output[0]}
"""
rule typing:
input:
rules.polish_consensus.output,
output:
"typing/{samplename}.csv",
shell:
"""
cat {input[0]} > {output[0]}
"""
rule apply_qc:
input:
rules.typing.output,
rules.polish_consensus.output,
output:
typing="typing/{samplename}_qcpass.csv",
consensus="onsensus/{samplename}_qcpass.fasta",
shell:
"""
echo qcpass > {output[0]}
echo qcpass > {output[1]}
"""
rule all:
default_target: True
input:
expand(rules.apply_qc.output[0], samplename="prrsv12"),
expand(rules.apply_qc.output[1], samplename="prrsv12"),