Search code examples
snakemake

Snakemake fasterq-dump wrapper AttributeError: 'Wildcards' object has no attribute 'accession'


I am trying to use fasterq-dump wrapper in my snakemake workflow to download paired-end fastq.gz files. Here is my snakefile:

# read a .txt file including many SRR* accession number
import pandas as pd
df = pd.read_csv('SraRunTable.txt', sep=',', header=0)

# append all accession number to a list
SAMPLES = []

for i in df['Run']:
    SAMPLES.append(i)

# snakemake workflow starts here
rule all:
    input:
        expand("/data/fastq/{sample}_1.fastq.gz", sample=SAMPLES)

rule get_fastq_pe_gz:
    output:
        # the wildcard name must be accession
        "/data/fastq/{sample}_1.fastq.gz",
        "/data/fastq/{sample}_2.fastq.gz",
    log:
        "/data/logs/{sample}.log"
    params:
        extra="--skip-technical"
    threads: 20
    wrapper:
        "v1.7.0/bio/sra-tools/fasterq-dump"

After executing it using conda, snakemake -s fasterq-dump.snake --cores 20 --use-conda, I received an AttributeError which I cannot figure it out. Any suggestions or solutions are appreciated!

Here is the complete log including the error message:

Building DAG of jobs...
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.7.0/bio/sra-tools/fasterq-dump/environment.yaml...
Downloading and installing remote packages.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.7.0/bio/sra-tools/fasterq-dump/environment.yaml created (location: .snakemake/conda/fab035359fa42a09dfad78160e9b8543)
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job                count    min threads    max threads
---------------  -------  -------------  -------------
all                    1              1              1
get_fastq_pe_gz      422             20             20
total                423              1             20

Select jobs to execute...

[Wed Jun 15 17:10:30 2022]
rule get_fastq_pe_gz:
    output: /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_1.fastq.gz, /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_2.fastq.gz
    log: /data/scratch/yaochung/Khrameeva/logs/SRR8750458.log
    jobid: 62
    reason: Missing output files: /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_1.fastq.gz
    wildcards: sample=SRR8750458
    threads: 20
    resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/fab035359fa42a09dfad78160e9b8543
Activating conda environment: .snakemake/conda/fab035359fa42a09dfad78160e9b8543
Traceback (most recent call last):
  File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/scripts/tmp4ip6wnot.wrapper.py", line 45, in <module>
    shell(
  File "/home/yaochung41/anaconda3/envs/snakemake/lib/python3.10/site-packages/snakemake/shell.py", line 139, in __new__
    cmd = format(cmd, *args, stepout=2, **kwargs)
  File "/home/yaochung41/anaconda3/envs/snakemake/lib/python3.10/site-packages/snakemake/utils.py", line 430, in format
    return fmt.format(_pattern, *args, **variables)
  File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 161, in format
    return self.vformat(format_string, args, kwargs)
  File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 165, in vformat
    result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 205, in _vformat
    obj, arg_used = self.get_field(field_name, args, kwargs)
  File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 276, in get_field
    obj = getattr(obj, i)
AttributeError: 'Wildcards' object has no attribute 'accession'
[Wed Jun 15 17:10:34 2022]
Error in rule get_fastq_pe_gz:
    jobid: 62
    output: /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_1.fastq.gz, /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_2.fastq.gz
    log: /data/scratch/yaochung/Khrameeva/logs/SRR8750458.log (check log file(s) for error message)
    conda-env: /data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543

RuleException:
CalledProcessError in line 25 of /data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/fasterq-dump.snake:
Command 'source /home/yaochung41/anaconda3/bin/activate '/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543'; set -euo pipefail;  python /data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/scripts/tmp4ip6wnot.wrapper.py' returned non-zero exit status 1.
  File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/fasterq-dump.snake", line 25, in __rule_get_fastq_pe_gz
  File "/home/yaochung41/anaconda3/envs/snakemake/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-06-15T170843.109776.snakemake.log

Solution

  • If you look at the code the shell command expects the files to use the wildcard accession instead of sample as in your rule. You should be able to rename sample to accession in your filenames and have it work.