Search code examples
snakemake

snakemake substitution issue


I am new to snakemake and have an issue with the following code that should take 9 fastq files one after another and apply fastqc.

smp should take the values:

UG1_S12 UG2_S13 UG3_S14 UR1_S1 UR2_S2 UR3_S3 UY1_S6 UY2_S7 UY3_S8

Which works when I run

SAMPLES, = glob_wildcards("reads/merged_s{smp}_L001.fastq.gz")
NB_SAMPLES = len(SAMPLES)

for smp in SAMPLES:
  message("Sample " + smp + " will be processed")
message("N= " + str(NB_SAMPLES))

The problem is the replacement of {smp} below which is first replaced by UY2_S7 then by UY3_S8 in the mv commands.

How should I make sure that the same substitution is used in both subcommands of the same rule?

my current code (inspired by):

SAMPLES, = glob_wildcards("reads/merged_s{smp}_L001.fastq.gz")

rule all: 
  input: 
        expand("reads/merged_s{smp}_L001.fastq.gz", smp=SAMPLES),
        "results/multiqc.html"

rule fastqc:
    """
    Run FastQC on each FASTQ file.
    """
    input:
        "reads/merged_s{smp}_L001.fastq.gz"
    output:
        "results/{smp}_fastqc.html",
        "intermediate/{smp}_fastqc.zip"
    version: "1.0"
    shadow: "minimal"
    threads: 8
    shell:
        """
        # Run fastQC and save the output to the current directory
        fastqc {input} -t {threads} -q -d . -o .

        # Move the files which are used in the workflow
        mv merged_s{smp}_L001_fastqc.html {output[0]}
        mv merged_s{smp}_L001_fastqc.zip {output[1]}
        """

the error:

Error in rule fastqc:
    jobid: 0
    output: results/UY2_S7_fastqc.html, intermediate/UY2_S7_fastqc.zip

RuleException:
CalledProcessError in line 60 of Snakefile:
Command ' set -euo pipefail;  
        # Run fastQC and save the output to the current directory
        fastqc reads/merged_sUY2_S7_L001.fastq.gz -t 8 -q -d . -o .

        # Move the files which are used in the workflow
        mv merged_sUY3_S8_L001_fastqc.html results/UY2_S7_fastqc.html
        mv merged_sUY3_S8_L001_fastqc.zip intermediate/UY2_S7_fastqc.zip ' returned non-zero exit status 130.
  File "Snakefile", line 60, in __rule_fastqc
  File "/opt/biotools/miniconda2/envs/snakemake-tutorial/lib/python3.6/concurrent/futures/thread.py", line 56, in run

Solution

  • If you want to use the wildcards in the shell command, you have to use {wildcards.smp} .
    What is probably happening is that {smp} in the shell command takes the value of the last iteration of the for loop above. So change:

    shell:
        """
        # Run fastQC and save the output to the current directory
        fastqc {input} -t {threads} -q -d . -o .
    
        # Move the files which are used in the workflow
        mv merged_s{smp}_L001_fastqc.html {output[0]}
        mv merged_s{smp}_L001_fastqc.zip {output[1]}
        """
    

    into:

    shell:
        """
        # Run fastQC and save the output to the current directory
        fastqc {input} -t {threads} -q -d . -o .
    
        # Move the files which are used in the workflow
        mv merged_s{wildcards.smp}_L001_fastqc.html {output[0]}
        mv merged_s{wildcards.smp}_L001_fastqc.zip {output[1]}
        """
    

    I have not checked the rest of the code.