Search code examples
pythonbioinformaticssnakemake

Submitted Snakemake Rules Start to Run and Fail Immediately without throwing any Error but Snakemake Keeps Running


I have written a snakemake pipeline for my project,part of which looks something like this:

SAMPLES, = glob_wildcards("/absolute/path/to/samples/{sample}.bam")
rule all:
    input:
       expand("splits/sample_check/{sample}_done.txt", sample=SAMPLES)

rule GVCFSplit:
    input:
        "gvcf/{SAMPLES}/",
        #"chr_pos_test/chr{c}/chr{c}_reg{i}.txt"
    output:
        "splits/sample_check/{SAMPLES}_done.txt"
    log:
        "logs/GVCFSplit/{SAMPLES}_done.log"
    benchmark:
        "benchmarks/GVCFSplit/{SAMPLES}_done.benchmark.txt"
    envmodules:
        "bcftools"
    resources:
        mem='1g',
        time='4:00:00',
        threads=1
    shell:
        r"""
            python3 /absolute/path/to/python/script/GVCF_split.py {wildcards.SAMPLES}
        """

The rule divides the per chromosome files into chunks of 50 Mb with the help of the python script below:

from pathlib import Path 
import subprocess
from sys import argv
import os,sys

sample_id=argv[1].strip()
chrs=list(range(1,23))

for c in chrs:
    sample_file=("/path/to/chromosome/files/per/sample/%s/%s_chr%i.g.vcf.gz") % (sample_id,sample_id,c)
    for r in range(1,chr_reg[c]+1):
        reg_file=("/path/to/per/chromosome/regions/chr%i/chr%i_reg%i.txt") % (c,c,r)
        #out_file=("try/gvcf/splits/chr%i/%s_reg%i.g.vcf.gz") % (c,sample_id,r)
        out_file=("/path/to/spiltted/vcf/files/chr%i/%s_reg%i.g.vcf.gz") % (c,sample_id,r)
        #Path(out_file).touch()
        proc = subprocess.run(["bcftools", "view", sample_file, "-Oz", "-o", out_file, "-R", reg_file])
        result = proc.returncode
        exit += result

if exit == 0:
    Path("splits/sample_check/"+sample_id+"_done.txt").touch() #creates a file for snakemake to track the changes if everything went fine
    sys.exit(0)
else:
    sys.exit(1)

When I manually run the python script as:

python3 GVCF_split.py "sample_id" it runs smoothly, but when I submit this snakemake file with --profile to the cluster, the rules are submitted per sample as expected, but they fail immediately after they start running. The snakemake file keeps running after that and no error is thrown. Here is the config file I use with the --profile flag:

cluster: mkdir -p slurm_snake/`basename {workflow.main_snakefile}`/{rule} &&
  sbatch
  --partition={resources.partition}
  --cpus-per-task={resources.threads}
  --mem={resources.mem}
  --time={resources.time}
  --job-name=smk-{rule}-{wildcards}
  --output=try/slurm_snake/`basename {workflow.main_snakefile}`/{rule}/{rule}-{wildcards}-%j.out
default-resources:
  - partition=main
  - mem='4G'
  - time="24:0:0"
  - threads=1
restart-times: 0
max-jobs-per-second: 5
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 1000
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy

I have the similar setup for my original snakemake file (this is a copy of it to try a few things before I do the changes in the original file), and in the original file the individual slurm files for each submission for each rule is kept in slurm_snake folder. However, there are no slurm files for these rules, what could be the reason and what am I doing wrong when submitting these to the cluster?

Here is an example of the slurm output of the main snakemake cluster submission:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 1000
Job stats:
job          count    min threads    max threads
---------  -------  -------------  -------------
GVCFSplit        5              1              1
all              1              1              1
total            6              1              1

Select jobs to execute...

[Thu Mar 14 14:47:00 2024]
rule GVCFSplit:
    input: gvcf/12_19264_20
    output: splits/sample_check/12_19264_20_done.txt
    log: logs/GVCFSplit/12_19264_20_done.log
    jobid: 5
    benchmark: benchmarks/GVCFSplit/12_19264_20_done.benchmark.txt
    reason: Missing output files: splits/sample_check/12_19264_20_done.txt
    wildcards: SAMPLES=12_19264_20
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/tmp, partition=main, mem=1g, time=4:00:00, threads=1


            python3 /path/to/script/GVCF_split.py 12_19264_20

The python script runs without any error when I run it from the terminal manually.


Solution

  • Apparently, this was caused by a mistake I made in my config file and, I will leave the question as a reminder and as a possible solution for the others that may suffer the same issue. The config file had a part something like below for the cluster submissions:

        cluster: mkdir -p slurm_snake/`basename {workflow.main_snakefile}`/{rule} &&
      sbatch
      --partition={resources.partition}
      --cpus-per-task={resources.threads}
      --mem={resources.mem}
      --time={resources.time}
      --job-name=smk-{rule}-{wildcards}
      --output=slurm_snake/`basename {workflow.main_snakefile}`/{rule}/{rule}-{wildcards}-%j.out
    

    This part basically tells the cluster what resources to use and where to write the slurm output etc. The mistake I had here was having the wrong directory in the --output flag, something like main_dir/slurm-snake/.... Therefore, cluster could not write the output to a directory that doesn't exist, and the rule submissions failed immediately. After I fixed this issue, my pipeline works smoothly.