Search code examples
pythonsnakemake

Snakemake: trimmomatic wrapper attribute error


I have a snakemake pipeline that looks like this:

configfile: "./config.yaml"
IN_DIR = config["in_dir"]
SAMPLES = config["samples"]

rule all:
    input: 
        expand("{sample}_Aligned.sortedByCoord.out.bam", sample=SAMPLES)

rule trimmomatic_pe:
    message:
        """
        Pre-processing raw reads with trimmomatic. Trimming low quality reads and adapter sequences. Running QC on trimmed reads.
        """
    input:
        r1 = expand("{in_dir}/{{sample}}_R1_001.fastq.gz", in_dir=IN_DIR),
        r2 = expand("{in_dir}/{{sample}}_R2_001.fastq.gz", in_dir=IN_DIR)
    params:
        trimmer = config["parameters"]["trim"],
        extra = ""
    output:
        r1 = "tmp/{sample}_R1_trimmed.fastq.gz",
        r2 = "tmp/{sample}_R2_trimmed.fastq.gz",
        r1_unpaired = "tmp/{sample}_R1_unpaired_trimmed.fastq.gz",
        r2_unpaired = "tmp/{sample}_R2_unpaired_trimmed.fastq.gz"
    threads:
        2
    wrapper:
        "0.74.0/bio/trimmomatic/pe"

rule map_reads:
    message:
        """
        Mapping trimmed reads to host genome
        """
    input:
        r1 = "tmp/{sample}_R1_trimmed.fastq.gz",
        r2 = "tmp/{sample}_R2_trimmed.fastq.gz"
    params:
        annotation = config["annotation_file"]
    output:
        "{sample}_Aligned.sortedByCoord.out.bam"
    shell:
        """
        STAR \
            --runThreadN 16 \
            --sjdbGTFfile {params.annotation} \
            --sjdbOverhang 149 \
            --outFilterType BySJout \
            --outFilterMultimapNmax 10 \
            --alignSJoverhangMin 5 \
            --alignSJDBoverhangMin 1 \
            --outFilterMismatchNmax 999 \
            --outFilterMismatchNoverReadLmax 0.04 \
            --alignIntronMin 20 \
            --alignIntronMax 1000000 \
            --alignMatesGapMax 1000000 \
            --outFilterIntronMotifs RemoveNoncanonicalUnannotated \
            --outFileNamePrefix {wildcards.sample}_ \
            --outSAMtype BAM SortedByCoordinate \
            --runMode alignReads \
            --genomeDir ./index \
            --readFilesIn {input.r1} {input.r2}
        """

When I run snakemake -np the DAG is correctly made, but I keep getting this error that I don't know how to interpret when I try to actually run the pipeline with snakemake --cores 2:

[Thu Apr 29 15:12:21 2021]
Job 1: 
        Pre-processing raw reads with trimmomatic. Trimming low quality reads and adapter sequences. Running QC on trimmed reads.
        

Traceback (most recent call last):
  File "/Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py", line 88, in <module>
    input_files, output_files, snakemake.threads
  File "/Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py", line 27, in distribute_threads
    gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
  File "/Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py", line 27, in <genexpr>
    gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
AttributeError: 'Namedlist' object has no attribute 'endswith'
[Thu Apr 29 15:12:22 2021]
Error in rule trimmomatic_pe:
    jobid: 1
    output: tmp/4_12hr_Ciliated_4_S4_R1_trimmed.fastq.gz, tmp/4_12hr_Ciliated_4_S4_R2_trimmed.fastq.gz, tmp/4_12hr_Ciliated_4_S4_R1_unpaired_trimmed.fastq.gz, tmp/4_12hr_Ciliated_4_S4_R2_unpaired_trimmed.fastq.gz

RuleException:
CalledProcessError in line 29 of /Users/user/Documents/postdoc_projects/invert/Snakefile:
Command 'set -euo pipefail;  /Users/user/opt/miniconda3/envs/invert/bin/python3.6 /Users/user/Documents/postdoc_projects/invert/.snakemake/scripts/tmp7yzyzwru.wrapper.py' returned non-zero exit status 1.
  File "/Users/user/Documents/postdoc_projects/invert/Snakefile", line 29, in __rule_trimmomatic_pe
  File "/Users/user/opt/miniconda3/envs/invert/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Is the pipeline not identifying my sample correctly? The attribute error seems like that's the case, but this is my config file structure:

in_dir: data #Directory containing raw fastq files from RNAseq
samples: "4_12hr_Ciliated_4_S4" #sample name prefix
annotation_file: ref_files/Homo_sapiens.GRCh38.103.gtf #Directory containing the viral host genome annotation in .gtf format

parameters:
  trim: ["TRAILING:3 ILLUMINACLIP:ref_files/TruSeq3-PE-2.fa"] #trimmomatic parameters

What am I missing?


Solution

  • The expand function returns a list. By setting the input files to a list instead of a string, you are confusing the script. For defining r1 and r2, you should use something that returns a string instead. I would suggest the string's format() function or an f-string.

    Change:

    r1 = expand("{in_dir}/{{sample}}_R1_001.fastq.gz", in_dir=IN_DIR),
    

    to:

    r1 = "{in_dir}/{{sample}}_R1_001.fastq.gz".format(in_dir=IN_DIR),
    

    or even:

    r1 = f"{IN_DIR}/{{sample}}_R1_001.fastq.gz",
    

    ...and do the same for r2