Search code examples
multithreadingcluster-computingslurmsnakemake

How to use multithreading on slurm cluster with snakemake?


I understand how to tell snakemake to use multiple threads for a given rule using the threads directive, and have a Snakefile that specifies a rule using multiple threads. If I run snakemake directly on a cluster, e.g. snakemake --cores 10 -n, then the output correctly says that the rule in question will use 10 threads. From this I can conclude that my rule syntax is correct within the Snakefile. However, if I instead submit the pipeline to a Slurm cluster scheduler, i.e. snakemake --profile slurm/config.yaml -n, then the output shows that the rule will only use a maximum of 1 threads. So, am I missing some slurm parameter in my config file? From what I could find, it sounds like this should be taken care of with --cpus-per-task {threads}, however this is not accomplishing what I want.

Here is the Snakefile rule:

rule bwa_map:
    input:
        fa="data/ref.fa"
        ind="data/ref.fa.bwt"
        r1="data/{sample}_R1.fastq.gz"
        r2="data/{sample}_R2.fastq.gz"
    output:
        "mapped_reads/{sample}.bam"
    shell:
        "bwa mem -t 2 {input.fa} {input.r1} {input.r2} | samtools view -b - > {output}"

And here is my config.yaml file below:

cluster: "sbatch --parsable -p high2 --time={resources.runtime} --mem={resources.mem_mb} --job-name={rule}.{wildcards} -o snake_logs/{rule}.{wildcards}.%j.out -e snake_logs/{rule}.{wildcards}.%j.err --cpus-per-task {threads}"


Solution

  • Try adding threads: 10 to your rule, else Snakemake doesn't have a value to propagate for the rule to sbatch:

    rule bwa_map:
        input:
            fa="data/ref.fa"
            ind="data/ref.fa.bwt"
            r1="data/{sample}_R1.fastq.gz"
            r2="data/{sample}_R2.fastq.gz"
        output:
            "mapped_reads/{sample}.bam"
        threads: 10
        shell:
            "bwa mem -t 2 {input.fa} {input.r1} {input.r2} | samtools view -b - > {output}"