Search code examples
snakemake

Snakemake pipeline not attempting to produce output?


I have a relatively simple snakemake pipeline but when run I get all missing files for rule all:

refseq = 'refseq.fasta'
reads = ['_R1_001', '_R2_001']

def getsamples():
    import glob
    test = (glob.glob("*.fastq"))
    print(test)
    samples = []
    for i in test:
        samples.append(i.rsplit('_', 2)[0])
    return(samples)

def getbarcodes():
    with open('unique.barcodes.txt') as file:
        lines = [line.rstrip() for line in file]
    return(lines)

rule all:
    input:
        expand("grepped/{barcodes}{sample}_R1_001.plate.fastq", barcodes=getbarcodes(), sample=getsamples()),
        expand("grepped/{barcodes}{sample}_R2_001.plate.fastq", barcodes=getbarcodes(), sample=getsamples())
    wildcard_constraints:
        barcodes="[a-z-A-Z]+$"


rule fastq_grep:
    input:
        R1 = "{sample}_R1_001.fastq",
        R2 = "{sample}_R2_001.fastq"
    output:
        out1 = "grepped/{barcodes}{sample}_R1_001.plate.fastq",
        out2 = "grepped/{barcodes}{sample}_R2_001.plate.fastq"
    
    wildcard_constraints:
        barcodes="[a-z-A-Z]+$"
    shell:
        "fastq-grep -i '{wildcards.barcodes}' {input.R1} > {output.out1} && fastq-grep -i '{wildcards.barcodes}' {input.R2} > {output.out2}"

The output files that are listed by the terminal seem correct, so it seems it is seeing what I want to produce but the shell is not making anything at all.

I want to produce a list of files that have grepped the list of barcodes I have in a file. But I get "Missing input files for rule all:"


Solution

  • There are two issues:

    1. You have an impossible wildcard_constraints defined for {barcode}
    2. Your two wildcards {barcode} and {sample} are competing with each other.

    Remove the wildcard_constraints from your two rules and add the following lines to the top of your Snakefile:

    wildcard_constraints:
        barcodes="[A-Z]+",
        sample="Well.*",
    

    The constraint for {barcodes} now only matches capital letters. Before it also included end-of-line matching (trailing $) which was impossible to match for this wildcard as you had additional text in the filepath following.

    The constraint for {sample} ensures that the path of the filename starting with "Well..." is interpreted as the start of the {sample} wildcard. Else you'd get something unwanted like barcode=ACGGTW instead of barcode=ACGGT.

    A note of advice: I usually find it easier to seperate wildcards into directory structures rather than having multiple wildcards in the same filename. In you case that would mean having a structure like grepped/{barcode}/{sample}_R1_001.plate.fastq.

    Full suggested Snakefile (formatted using snakefmt)

    wildcard_constraints:
        barcodes="[A-Z]+",
        sample="Well.*",
    
    
    refseq = "refseq.fasta"
    reads = ["_R1_001", "_R2_001"]
    
    
    def getsamples():
        import glob
    
        test = glob.glob("*.fastq")
        print(test)
        samples = []
        for i in test:
            samples.append(i.rsplit("_", 2)[0])
        return samples
    
    
    def getbarcodes():
        with open("unique.barcodes.txt") as file:
            lines = [line.rstrip() for line in file]
        return lines
    
    
    rule all:
        input:
            expand(
                "grepped/{barcodes}{sample}_R1_001.plate.fastq",
                barcodes=getbarcodes(),
                sample=getsamples(),
            ),
            expand(
                "grepped/{barcodes}{sample}_R2_001.plate.fastq",
                barcodes=getbarcodes(),
                sample=getsamples(),
            ),
    
    
    rule fastq_grep:
        input:
            R1="{sample}_R1_001.fastq",
            R2="{sample}_R2_001.fastq",
        output:
            out1="grepped/{barcodes}{sample}_R1_001.plate.fastq",
            out2="grepped/{barcodes}{sample}_R2_001.plate.fastq",
        shell:
            "fastq-grep -i '{wildcards.barcodes}' {input.R1} > {output.out1} && fastq-grep -i '{wildcards.barcodes}' {input.R2} > {output.out2}"