Search code examples
bioinformaticssnakemake

Snakemake, how to change output filename when using wildcards


I think I have a simple problem but I don't how to solve it.

My input folder contains files like this:

AAAAA_S1_R1_001.fastq
AAAAA_S1_R2_001.fastq
BBBBB_S2_R1_001.fastq
BBBBB_S2_R2_001.fastq

My snakemake code:

import glob

samples = [os.path.basename(x) for x in sorted(glob.glob("input/*.fastq"))]
name = []
for x in samples:
    if "_R1_" in x:
        name.append(x.split("_R1_")[0])
NAME = name

rule all:
    input:
        expand("output/{sp}_mapped.bam", sp=NAME),

rule bwa:
    input:
        R1 = "input/{sample}_R1_001.fastq",
        R2 = "input/{sample}_R2_001.fastq"
    output:
        mapped = "output/{sample}_mapped.bam"
    params:
        ref = "refs/AF086833.fa"
    run:
        shell("bwa mem {params.ref} {input.R1} {input.R2} | samtools sort > {output.mapped}")

The output file names are:

AAAAA_S1_mapped.bam
BBBBB_S2_mapped.bam

I want the output file to be:

AAAAA_mapped.bam
BBBBB_mapped.bam

How can I or change the outputname or rename the files before or after the bwa rule.


Solution

  • Try this:

    import pathlib
    
    indir = pathlib.Path("input")
    paths = indir.glob("*_S?_R?_001.fastq")
    samples = set([x.stem.split("_")[0] for x in paths])
    
    rule all:
        input:
            expand("output/{sample}_mapped.bam", sample=samples)
    
    
    def find_fastqs(wildcards):
        fastqs = [str(x) for x in indir.glob(f"{wildcards.sample}_*.fastq")]
        return sorted(fastqs)
    
    
    rule bwa:
        input:
            fastqs = find_fastqs
        output:
            mapped = "output/{sample}_mapped.bam"
        params:
            ref = "refs/AF086833.fa"
        shell:
            "bwa mem {params.ref} {input.fastqs} | samtools sort > {output.mapped}"
    

    Uses an input function to find the correct samples for rule bwa. There might be a more elegant solution, but I can't see it right now. I think this should work, though.

    (Edited to reflect OP's edit.)