Search code examples
snakemake

Wildcard SyntaxError in Snakemake with no obvious cause


I keep getting an error about a rule not having the same wildcards in its output rules and I can't figure out what the source of the error might be:

SyntaxError:
Not all output, log and benchmark files of rule bcftools_filter contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.

...

rule merge_YRI_GTEx:
    input:
        kg=expand("kg_vcf/1kg_yri_chr{q}.vcf.gz", q=range(1,23)),
        gtex=expand("gtex_vcf/gtex_chr{v}.snps.recode.vcf.gz", v=range(1, 23))
    output:
        "merged/merged_chr{i}.vcf.gz"
    shell:
        "bcftools merge \
            -0 \
            -O z \
            -o {output} \
            {input.kg} \
            {input.gtex}"


rule bcftools_filter:
    input:
        expand("merged/merged_chr{i}.vcf.gz", i=range(1,23))
    output:
        filt="filtered_vcf/merged_filtered_chr{i}.vcf.gz",
        chk=touch(".bcftools_filter.chkpnt")
    threads:
        4
    shell:
        "bcftools filter \
            --include 'AN=1890 && AC > 0' \
            --threads {threads} \
            -O z \
            -o {output.filt} \
            {input}"
...
rule list_merged_filtered_vcfs:
    input:
        ".bcftools_filter.chkpnt"
    output:
        "processed_vcf_list.txt"
    shell:
        "for i in {{1..22}}; do \ "
        "echo \"{config[sprime_dir]}/filtered_vcf/merged_filtered_chr${{i}}.vcf.gz\" >> \
        {output}; done"

The specific line it's complaining about is the one that's just "bcftools filter \ which is even more dumbfounding to me. I've tried giving names to the input wildcard and even scrutinizing the rule which calls bcftools_filter's output as well as the rule which produces bcftools_filter's input to no avail. Not sure what is giving me this error.


Solution

  • I think the error comes from chk=touch(".bcftools_filter.chkpnt") not containing the wildcard {i}.

    Apart from that, I'm not sure you rule is very sensible. You are passing to bcftools filter a list of input files (from expand(...)) but I don't think bcftools filter accept more than one input file. Also, your rule will create output files filtered_vcf/merged_filtered_chr{i}.vcf.gz (one for each value of i) using the same list of input files. Are you sure you want expand("merged/merged_chr{i}.vcf.gz", i=range(1,23)) instead of just "merged/merged_chr{i}.vcf.gz", with values for i given somewhere upstream?