Search code examples
pythonpython-3.xcondasnakemake

submitting a snakemake job to the cluster from within a ('correct') conda environment


I am writing a snakemake file that shall perform multiple operations on multiple samples. After I validated the workflow running on my local computer, I am now working on running the workflow on a cluster.

My first two rules are independent from one another, the first uses fastqc and the other bwa mem

These two rules look like (at this point I am only calling the workflow on a single SAMPLE = 'NIPT-PearlPPlasma-03-PPx_S3downSample'):

rule fastQC:
    input:
        R1 = FQDIR + "{sample}_R1_001.fastq.gz",
        R2 = FQDIR + "{sample}_R2_001.fastq.gz"
    output:
        directory("fastQC/{sample}")
    conda:
        "envs/NIPTlibPrep.yaml"
    log:
        "logs/fastQC/{sample}.log" # log was giving an error when running at the command line
    shell:
        # 2> {log} at the end of the command removed
        # See wrapper at https:/snakemake-wrappers.readthedocs.io/en/stable/wrappers/fastqc.html
        "mkdir -p fastQC/{wildcards.sample} | fastqc --outdir fastQC/{wildcards.sample} -f fastq {input.R1} {input.R2}"


rule bwa_map: 
    input:
        R1 = FQDIR + "{sample}_R1_001.fastq.gz",
        R2 = FQDIR + "{sample}_R2_001.fastq.gz",
        REF = config['ref']    
    output:
        # wrap output in temp
        "aligned/{sample}.bam"
    log:
        "logs/bwa_mem/{sample}.log" 
    conda:
        "envs/NIPTlibPrep.yaml"
    shell:
        "bwa mem {input.REF} {input.R1} {input.R2} "
        "| samtools view -Sb - > {output} 2> {log}"

But when I call:

snakemake -p -s Snakefile_v4_ngs_bngs05b --cluster qsub -j 5 --use-conda

I get:

Error in rule bwa_map:
    jobid: 10
    output: aligned/NIPT-PearlPPlasma-03-PPx_S3downSample.bam
    log: logs/bwa_mem/NIPT-PearlPPlasma-03-PPx_S3downSample.log (check log file(s) for error message)
    conda-env: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/conda/38107c2c
    shell:
        bwa mem /home/ngs/data/genomes/b37/human_g1k_v37.fasta /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R1_001.fastq.gz /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R2_001.fastq.gz | samtools view -Sb - > aligned/NIPT-PearlPPlasma-03-PPx_S3downSample.bam 2> logs/bwa_mem/NIPT-PearlPPlasma-03-PPx_S3downSample.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Your job 381368 ("snakejob.bwa_map.10.sh") has been submitted

Error executing rule bwa_map on cluster (jobid: 10, external: Your job 381368 ("snakejob.bwa_map.10.sh") has been submitted, jobscript: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/tmp.bnhr7qck/snakejob.bwa_map.10.sh). For error details see the cluster log and the log files of the involved rule(s).
[Wed Apr  8 17:21:45 2020]
Error in rule fastQC:
    jobid: 1
    output: fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample
    log: logs/fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample.log (check log file(s) for error message)
    conda-env: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/conda/38107c2c
    shell:
        mkdir -p fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample | fastqc --outdir fastQC/NIPT-PearlPPlasma-03-PPx_S3downSample -f fastq /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R1_001.fastq.gz /nexusb/Novaseq/200311_A00154_0454_AHHHKMDRXX/Unaligned/NIPT-PearlPPlasma-03-PPx_S3downSample_R2_001.fastq.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Your job 381369 ("snakejob.fastQC.1.sh") has been submitted

Error executing rule fastQC on cluster (jobid: 1, external: Your job 381369 ("snakejob.fastQC.1.sh") has been submitted, jobscript: /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/outSnakeMake_test/.snakemake/tmp.bnhr7qck/snakejob.fastQC.1.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Upon first execution of the workflow I noticed that the environment was created at .snakemake/conda (relative to the Snakefile). When I call the script for the second time, without changing the conda directives, snakemake uses the same conda-env.

The description of my environment looks like:

channels:
  - bioconda
  - conda-forge
dependencies:
  - bwa=0.7.17
  - samtools=1.9
  - picard=2.22.1
  - mosdepth=0.2.6
  - python=3.7.6
  - pandas=1.0.3
  - fastqc=0.11.9

and it is saved at envs/NIPTlibPrep.yaml (relative to the Snakefile)

The fact that the workflow finishes locally but cannot be run on the cluster I find really puzzling, especially considering the fact that the environment with the correct dependencies was successfully created.


Solution

  • Take a look at: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management

    You will have to make a make a conda environment yaml for the rule sort_bam, something like this:

    channels:
      - bioconda
    dependencies:
      - samtools
    

    And then in your rule you need to refer to this file under the conda attribute:

    rule sort_bam:
        input:
            "aligned/{sample}.bam"
        output:
            protected("aligned/{sample}.sorted.bam")
        params:
            THREADS = config['sort_threads']
        conda:
            "samtools.yaml"
        shell:
            "samtools sort -T aligned/{wildcards.sample} "
            "-O bam {input} > {output}"
    

    Then you can call snakemake -np -s Snakefile_v4_ngs_bngs05b --cluster qsub -j 5 --use-conda and Snakemake will take care of the rest.