Adding .g.vcf instead of .vcf after the variable in expand rule is somehow adding the .g to a wildcard in another module
I have tried the following in the all rule :
{stuff}.g.vcf
{stuff}"+"g.vcf"
{stuff}_var"+".g.vcf"
{stuff}.t.vcf
all fail but {stuff}.gvcf or {stuff}.vcf work
Error:
InputFunctionException in line 21 of snake_modules/mark_duplicates.snakefile: KeyError: 'Mother.g' Wildcards: lane=Mother.g
Code:
LANES = config["list2"].split()
rule all:
input:
expand(projectDir+"results/alignments/variants/{stuff}.g.vcf", stuff=LANES)
rule mark_duplicates:
""" this will mark duplicates for bam files from the same sample and library """
input:
get_lanes
output:
projectDir+"results/alignments/markdups/{lane}.markdup.bam"
log:
projectDir+"logs/"+stamp+"_{lane}_markdup.log"
shell:
" input=$(echo '{input}' |sed -e s'/ / I=/g') && java -jar /home/apps/pipelines/picard-tools/CURRENT MarkDuplicates I=$input O={projectDir}results/alignments/markdups/{wildcards.lane}.markdup.bam M={projectDir}results/alignments/markdups/{wildcards.lane}.markdup_metrics.txt &> {log}"
I want my final output to have the {stuff}.g.vcf notation. Please note this output is created in another snake module but the error appears in the mark duplicates which is before the other module.
I have tried multiple changes but it is the .g.vcf in the all rule that causes the issue.
My guess is that {lane}
is interpreted as a regular expression and it's capturing more than it should. Try adding before rule all
:
wildcard_constraints:
stuff= '|'.join([re.escape(x) for x in LANES]),
lane= '|'.join([re.escape(x) for x in LANES])
(See also this thread https://groups.google.com/forum/#!topic/snakemake/wVlJW9X-9EU)