I'm having some trouble running snakemake. I want to perform quality control of some RNA-Seq bulk samples using FastQC. I've written the code in a way that all files following the pattern {sample}_{replicate}.fastq.gz
should be used as input, where {sample}
is the sample id (i.e. SRR6974023) and {replicate}
is 1 or 2. My little scripts follows:
configfile: "config.yaml"
rule all:
input:
expand("raw_qc/{sample}_{replicate}_fastqc.{extension}", sample=config["samples"], replicate=[1, 2], extension=["zip", "html"])
rule fastqc:
input:
rawread=expand("raw_data/{sample}_{replicate}.fastq.gz", sample=config["samples"], replicate=[1, 2])
output:
compress=expand("raw_qc/{sample}_{replicate}_fastqc.zip", sample=config["samples"], replicate=[1, 2]),
net=expand("raw_qc/{sample}_{replicate}_fastqc.html", sample=config["samples"], replicate=[1, 2])
threads:
8
params:
path="raw_qc/"
shell:
"fastqc -t {threads} {input.rawread} -o {params.path}"
Just is case, the config.yaml
is:
samples:
SRR6974023
SRR6974024
The raw_data
directory with my files look like this:
SRR6974023_1.fastq.gz SRR6974023_2.fastq.gz SRR6974024_1.fastq.gz SRR6974024_2.fastq.gz
Finally, when I run the script, I always see the same error:
Building DAG of jobs...
MissingInputException in line 8 of /home/user/path/Snakefile:
Missing input files for rule fastqc:
raw_data/SRR6974023 SRR6974024_2.fastq.gz
raw_data/SRR6974023 SRR6974024_1.fastq.gz
It see correctly only the last files, in this case SRR6974024_1.fastq.gz
and SRR6974024_2.fastq.gz
. Whatsoever, the other one it's only seen as SRR6974023
. How can I solve this issue? I appreciate some help. Thank you all!
The yaml
is not configured correctly. It should have -
to turn each row into a list:
samples:
- SRR6974023
- SRR6974024