I think I have a simple problem but I don't how to solve it.
My input folder contains files like this:
AAAAA_S1_R1_001.fastq
AAAAA_S1_R2_001.fastq
BBBBB_S2_R1_001.fastq
BBBBB_S2_R2_001.fastq
My snakemake code:
import glob
samples = [os.path.basename(x) for x in sorted(glob.glob("input/*.fastq"))]
name = []
for x in samples:
if "_R1_" in x:
name.append(x.split("_R1_")[0])
NAME = name
rule all:
input:
expand("output/{sp}_mapped.bam", sp=NAME),
rule bwa:
input:
R1 = "input/{sample}_R1_001.fastq",
R2 = "input/{sample}_R2_001.fastq"
output:
mapped = "output/{sample}_mapped.bam"
params:
ref = "refs/AF086833.fa"
run:
shell("bwa mem {params.ref} {input.R1} {input.R2} | samtools sort > {output.mapped}")
The output file names are:
AAAAA_S1_mapped.bam
BBBBB_S2_mapped.bam
I want the output file to be:
AAAAA_mapped.bam
BBBBB_mapped.bam
How can I or change the outputname or rename the files before or after the bwa rule.
Try this:
import pathlib
indir = pathlib.Path("input")
paths = indir.glob("*_S?_R?_001.fastq")
samples = set([x.stem.split("_")[0] for x in paths])
rule all:
input:
expand("output/{sample}_mapped.bam", sample=samples)
def find_fastqs(wildcards):
fastqs = [str(x) for x in indir.glob(f"{wildcards.sample}_*.fastq")]
return sorted(fastqs)
rule bwa:
input:
fastqs = find_fastqs
output:
mapped = "output/{sample}_mapped.bam"
params:
ref = "refs/AF086833.fa"
shell:
"bwa mem {params.ref} {input.fastqs} | samtools sort > {output.mapped}"
Uses an input function to find the correct samples for rule bwa
. There might be a more elegant solution, but I can't see it right now. I think this should work, though.
(Edited to reflect OP's edit.)