Search code examples
pythonsnakemake

Snakemake - how do I input a range of different files delineated from each other numerically?


This seems like a basic question but I keep getting some variation of the error: No values given for wildcard.

I have a group of 22 files named Ne-sQTL_perind.counts.gz.qqnorm_chr{#}.gz. I would like to act on them in a rule. What I have originally looks like this:

rule QTLtools_filter:
    input:
        file=expand("Ne-sQTL_perind.counts.gz.qqnorm_chr{i}.gz",i=range(1,22)),
        chk=".prepare_phen_table.chkpnt"
    output:
        expand("{input.file}.qtltools")
    message:
        "Making phenotype files QTLtools compatible..."
    shell:
        "cat {input.file} | awk '{ $4=$4\" . +\"; print $0 }' | tr " " \"\t\" | bgzip -c > {input.file}.qtltools"

However, I get the No values found for wildcare 'input', which is confusing to me, because in the docs, we have a clear example of this working with the wildcare replicates. How do I expand this wildcard such that it includes all files numbered between 1-22? I've also tried defining a function to do this for me at the suggestion of this SO post to no avail; still same error message.

def expandChromo(wildcards):
    return expand("Ne-sQTL_perind.counts.gz.qqnorm_chr{i}.gz",i=range(1,22))
...
rule QTLtools_filter:
    input:
        expandChromo,
        chk=".prepare_phen_table.chkpnt"
    output:
        expand("{wildcards.expandChromo}.qtltools")
    message:
        "Making phenotype files QTLtools compatible..."
    shell:
        "cat {wildcards.expandChromo} | awk '{ $4=$4\" . +\"; print $0 }' | tr " " \"\t\" | bgzip -c > {wildcards.expandChromo}.qtltools"

Solution

  • You need to have 2 rules. The first one (let's call it all) has no output but clearly states what do you want to get as the result of your pipeline:

    rule all:
        input: expand("Ne-sQTL_perind.counts.gz.qqnorm_chr{i}.gz.qtltools", i=range(1,22))
    

    This would give Snakemake an idea of your 22 target files.

    Now you can teach Snakemake to create those files:

    rule QTLtools_filter:
        input:
            "{file}.gz"
        output:
            "{file}.gz.qtltools"
        message:
            "Making phenotype files QTLtools compatible..."
        shell:
            "cat {input} | awk '{ $4=$4\" . +\"; print $0 }' | tr " " \"\t\" | bgzip -c > {input}.qtltools"
    

    Note that this rule takes a single file as an input and single file as an output, and the wildcard allows Snakemake to match this pair for each i in your range. I didn't find any reason for setting chk=".prepare_phen_table.chkpnt" as an input, but this is something you may add if needed.