I am new to snakemake and have an issue with the following code that should take 9 fastq files one after another and apply fastqc.
smp should take the values:
UG1_S12 UG2_S13 UG3_S14 UR1_S1 UR2_S2 UR3_S3 UY1_S6 UY2_S7 UY3_S8
Which works when I run
SAMPLES, = glob_wildcards("reads/merged_s{smp}_L001.fastq.gz")
NB_SAMPLES = len(SAMPLES)
for smp in SAMPLES:
message("Sample " + smp + " will be processed")
message("N= " + str(NB_SAMPLES))
The problem is the replacement of {smp} below which is first replaced by UY2_S7 then by UY3_S8 in the mv commands.
How should I make sure that the same substitution is used in both subcommands of the same rule?
my current code (inspired by):
SAMPLES, = glob_wildcards("reads/merged_s{smp}_L001.fastq.gz")
rule all:
input:
expand("reads/merged_s{smp}_L001.fastq.gz", smp=SAMPLES),
"results/multiqc.html"
rule fastqc:
"""
Run FastQC on each FASTQ file.
"""
input:
"reads/merged_s{smp}_L001.fastq.gz"
output:
"results/{smp}_fastqc.html",
"intermediate/{smp}_fastqc.zip"
version: "1.0"
shadow: "minimal"
threads: 8
shell:
"""
# Run fastQC and save the output to the current directory
fastqc {input} -t {threads} -q -d . -o .
# Move the files which are used in the workflow
mv merged_s{smp}_L001_fastqc.html {output[0]}
mv merged_s{smp}_L001_fastqc.zip {output[1]}
"""
the error:
Error in rule fastqc:
jobid: 0
output: results/UY2_S7_fastqc.html, intermediate/UY2_S7_fastqc.zip
RuleException:
CalledProcessError in line 60 of Snakefile:
Command ' set -euo pipefail;
# Run fastQC and save the output to the current directory
fastqc reads/merged_sUY2_S7_L001.fastq.gz -t 8 -q -d . -o .
# Move the files which are used in the workflow
mv merged_sUY3_S8_L001_fastqc.html results/UY2_S7_fastqc.html
mv merged_sUY3_S8_L001_fastqc.zip intermediate/UY2_S7_fastqc.zip ' returned non-zero exit status 130.
File "Snakefile", line 60, in __rule_fastqc
File "/opt/biotools/miniconda2/envs/snakemake-tutorial/lib/python3.6/concurrent/futures/thread.py", line 56, in run
If you want to use the wildcards in the shell command, you have to use {wildcards.smp}
.
What is probably happening is that {smp}
in the shell command takes the value of the last iteration of the for loop above. So change:
shell:
"""
# Run fastQC and save the output to the current directory
fastqc {input} -t {threads} -q -d . -o .
# Move the files which are used in the workflow
mv merged_s{smp}_L001_fastqc.html {output[0]}
mv merged_s{smp}_L001_fastqc.zip {output[1]}
"""
into:
shell:
"""
# Run fastQC and save the output to the current directory
fastqc {input} -t {threads} -q -d . -o .
# Move the files which are used in the workflow
mv merged_s{wildcards.smp}_L001_fastqc.html {output[0]}
mv merged_s{wildcards.smp}_L001_fastqc.zip {output[1]}
"""
I have not checked the rest of the code.