I am writing a snakemake file that shall perform multiple operations on multiple samples. After I validated the workflow running on my local computer, I am now working on running the workflow on a cluster.
My first two rules are independent from one another, the first uses fastqc
and the other bwa mem
These two rules look like (at this point I am only calling the workflow on a single SAMPLE = 'NIPT-PearlPPlasma-03-PPx_S3downSample'
):
rule fastQC:
input:
R1 = FQDIR + "{sample}_R1_001.fastq.gz",
R2 = FQDIR + "{sample}_R2_001.fastq.gz"
output:
directory("fastQC/{sample}")
conda:
"envs/NIPTlibPrep.yaml"
log:
"logs/fastQC/{sample}.log" # log was giving an error when running at the command line
shell:
# 2> {log} at the end of the command removed
# See wrapper at https:/snakemake-wrappers.readthedocs.io/en/stable/wrappers/fastqc.html
"mkdir -p fastQC/{wildcards.sample} | fastqc --outdir fastQC/{wildcards.sample} -f fastq {input.R1} {input.R2}"
rule bwa_map:
input:
R1 = FQDIR + "{sample}_R1_001.fastq.gz",
R2 = FQDIR + "{sample}_R2_001.fastq.gz",
REF = config['ref']
output:
# wrap output in temp
"aligned/{sample}.bam"
log:
"logs/bwa_mem/{sample}.log"
conda:
"envs/NIPTlibPrep.yaml"
shell:
"bwa mem {input.REF} {input.R1} {input.R2} "
"| samtools view -Sb - > {output} 2> {log}"
But when I call:
snakemake -p -s Snakefile_v4_ngs_bngs05b --cluster qsub -j 5 --use-conda
I get:
Error in rule bwa_map:
jobid: 10
output: aligned/NIPT-PearlPPlasma-03-PPx_S3downSample.bam
log: logs/bwa_mem/NIPT-PearlPPlasma-03-PPx_S3downSample.log (check log file(s) for error message)
conda-env: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/conda/38107c2c
shell:
bwa mem /home/ngs/data/genomes/b37/human_g1k_v37.fasta /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R1_001.fastq.gz /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R2_001.fastq.gz | samtools view -Sb - > aligned/NIPT-PearlPPlasma-03-PPx_S3downSample.bam 2> logs/bwa_mem/NIPT-PearlPPlasma-03-PPx_S3downSample.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Your job 381368 ("snakejob.bwa_map.10.sh") has been submitted
Error executing rule bwa_map on cluster (jobid: 10, external: Your job 381368 ("snakejob.bwa_map.10.sh") has been submitted, jobscript: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/tmp.bnhr7qck/snakejob.bwa_map.10.sh). For error details see the cluster log and the log files of the involved rule(s).
[Wed Apr 8 17:21:45 2020]
Error in rule fastQC:
jobid: 1
output: fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample
log: logs/fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample.log (check log file(s) for error message)
conda-env: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/conda/38107c2c
shell:
mkdir -p fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample | fastqc --outdir fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample -f fastq /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R1_001.fastq.gz /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R2_001.fastq.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Your job 381369 ("snakejob.fastQC.1.sh") has been submitted
Error executing rule fastQC on cluster (jobid: 1, external: Your job 381369 ("snakejob.fastQC.1.sh") has been submitted, jobscript: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/tmp.bnhr7qck/snakejob.fastQC.1.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Upon first execution of the workflow I noticed that the environment was created at .snakemake/conda
(relative to the Snakefile). When I call the script for the second time, without changing the conda
directives, snakemake uses the same conda-env
.
The description of my environment looks like:
channels:
- bioconda
- conda-forge
dependencies:
- bwa=0.7.17
- samtools=1.9
- picard=2.22.1
- mosdepth=0.2.6
- python=3.7.6
- pandas=1.0.3
- fastqc=0.11.9
and it is saved at envs/NIPTlibPrep.yaml
(relative to the Snakefile)
The fact that the workflow finishes locally but cannot be run on the cluster I find really puzzling, especially considering the fact that the environment with the correct dependencies was successfully created.
Take a look at: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management
You will have to make a make a conda environment yaml for the rule sort_bam
, something like this:
channels:
- bioconda
dependencies:
- samtools
And then in your rule you need to refer to this file under the conda
attribute:
rule sort_bam:
input:
"aligned/{sample}.bam"
output:
protected("aligned/{sample}.sorted.bam")
params:
THREADS = config['sort_threads']
conda:
"samtools.yaml"
shell:
"samtools sort -T aligned/{wildcards.sample} "
"-O bam {input} > {output}"
Then you can call snakemake -np -s Snakefile_v4_ngs_bngs05b --cluster qsub -j 5 --use-conda
and Snakemake will take care of the rest.