Search code examples
snakemake

How to stop snakemake from adding non file endings to wildcards when using expand function? (.g.vcf fails, .vcf works)


Adding .g.vcf instead of .vcf after the variable in expand rule is somehow adding the .g to a wildcard in another module

I have tried the following in the all rule :

{stuff}.g.vcf 
{stuff}"+"g.vcf" 
{stuff}_var"+".g.vcf"
{stuff}.t.vcf 

all fail but {stuff}.gvcf or {stuff}.vcf work

Error:

InputFunctionException in line 21 of snake_modules/mark_duplicates.snakefile: KeyError: 'Mother.g' Wildcards: lane=Mother.g

Code:

LANES = config["list2"].split()

rule all:
    input:
         expand(projectDir+"results/alignments/variants/{stuff}.g.vcf", stuff=LANES)


rule mark_duplicates:
    """ this will mark duplicates for bam files from the same sample and library """
    input:
        get_lanes
    output:
        projectDir+"results/alignments/markdups/{lane}.markdup.bam"
    log:
        projectDir+"logs/"+stamp+"_{lane}_markdup.log"
    shell:
        " input=$(echo '{input}' |sed -e s'/ / I=/g') && java -jar /home/apps/pipelines/picard-tools/CURRENT MarkDuplicates I=$input O={projectDir}results/alignments/markdups/{wildcards.lane}.markdup.bam M={projectDir}results/alignments/markdups/{wildcards.lane}.markdup_metrics.txt &> {log}"

I want my final output to have the {stuff}.g.vcf notation. Please note this output is created in another snake module but the error appears in the mark duplicates which is before the other module.

I have tried multiple changes but it is the .g.vcf in the all rule that causes the issue.


Solution

  • My guess is that {lane} is interpreted as a regular expression and it's capturing more than it should. Try adding before rule all:

    wildcard_constraints:
        stuff= '|'.join([re.escape(x) for x in LANES]),
        lane= '|'.join([re.escape(x) for x in LANES])
    

    (See also this thread https://groups.google.com/forum/#!topic/snakemake/wVlJW9X-9EU)