I am trying to use fasterq-dump wrapper in my snakemake workflow to download paired-end fastq.gz files. Here is my snakefile:
# read a .txt file including many SRR* accession number
import pandas as pd
df = pd.read_csv('SraRunTable.txt', sep=',', header=0)
# append all accession number to a list
SAMPLES = []
for i in df['Run']:
SAMPLES.append(i)
# snakemake workflow starts here
rule all:
input:
expand("/data/fastq/{sample}_1.fastq.gz", sample=SAMPLES)
rule get_fastq_pe_gz:
output:
# the wildcard name must be accession
"/data/fastq/{sample}_1.fastq.gz",
"/data/fastq/{sample}_2.fastq.gz",
log:
"/data/logs/{sample}.log"
params:
extra="--skip-technical"
threads: 20
wrapper:
"v1.7.0/bio/sra-tools/fasterq-dump"
After executing it using conda, snakemake -s fasterq-dump.snake --cores 20 --use-conda
, I received an AttributeError which I cannot figure it out. Any suggestions or solutions are appreciated!
Here is the complete log including the error message:
Building DAG of jobs...
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.7.0/bio/sra-tools/fasterq-dump/environment.yaml...
Downloading and installing remote packages.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.7.0/bio/sra-tools/fasterq-dump/environment.yaml created (location: .snakemake/conda/fab035359fa42a09dfad78160e9b8543)
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
--------------- ------- ------------- -------------
all 1 1 1
get_fastq_pe_gz 422 20 20
total 423 1 20
Select jobs to execute...
[Wed Jun 15 17:10:30 2022]
rule get_fastq_pe_gz:
output: /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_1.fastq.gz, /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_2.fastq.gz
log: /data/scratch/yaochung/Khrameeva/logs/SRR8750458.log
jobid: 62
reason: Missing output files: /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_1.fastq.gz
wildcards: sample=SRR8750458
threads: 20
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/fab035359fa42a09dfad78160e9b8543
Activating conda environment: .snakemake/conda/fab035359fa42a09dfad78160e9b8543
Traceback (most recent call last):
File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/scripts/tmp4ip6wnot.wrapper.py", line 45, in <module>
shell(
File "/home/yaochung41/anaconda3/envs/snakemake/lib/python3.10/site-packages/snakemake/shell.py", line 139, in __new__
cmd = format(cmd, *args, stepout=2, **kwargs)
File "/home/yaochung41/anaconda3/envs/snakemake/lib/python3.10/site-packages/snakemake/utils.py", line 430, in format
return fmt.format(_pattern, *args, **variables)
File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 161, in format
return self.vformat(format_string, args, kwargs)
File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 165, in vformat
result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 205, in _vformat
obj, arg_used = self.get_field(field_name, args, kwargs)
File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543/lib/python3.10/string.py", line 276, in get_field
obj = getattr(obj, i)
AttributeError: 'Wildcards' object has no attribute 'accession'
[Wed Jun 15 17:10:34 2022]
Error in rule get_fastq_pe_gz:
jobid: 62
output: /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_1.fastq.gz, /data/scratch/yaochung/Khrameeva/fastq/SRR8750458_2.fastq.gz
log: /data/scratch/yaochung/Khrameeva/logs/SRR8750458.log (check log file(s) for error message)
conda-env: /data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543
RuleException:
CalledProcessError in line 25 of /data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/fasterq-dump.snake:
Command 'source /home/yaochung41/anaconda3/bin/activate '/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/conda/fab035359fa42a09dfad78160e9b8543'; set -euo pipefail; python /data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/.snakemake/scripts/tmp4ip6wnot.wrapper.py' returned non-zero exit status 1.
File "/data/scratch/yaochung/TEKRABber_thesis/pipelines/fasterq-dump/fasterq-dump.snake", line 25, in __rule_get_fastq_pe_gz
File "/home/yaochung41/anaconda3/envs/snakemake/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-06-15T170843.109776.snakemake.log
If you look at the code the shell command expects the files to use the wildcard accession
instead of sample
as in your rule. You should be able to rename sample to accession in your filenames and have it work.