Search code examples
condasnakemake

snakemake error when running two jobs at once that use same conda environment


I am encountering an error when executing a Snakemake (6.0.0) workflow such that two jobs are launched simultaneously, on the same node, both of which use the same conda environment. Minimal example is below.

A few observations:

  • The problem occurs when running the workflow on a node of my institutional cluster, but not on my local machine. (I am running snakemake from within an interactive slurm job with >1 cpu; I am using Miniconda, provided as a module by my cluster sysadmins)
  • The workflow completes just fine when the tasks are forced to run serially (snakemake --use-conda -j1). The problem only occurs when -j2 or higher (not exceeding the number of cores available in the slurm allocation). The first job seems to run just fine, it's always the second job that croaks.
  • I can activate the conda environment at issue just fine after it has been created by snakemake (e.g. after running the workflow, conda activate /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca works, I can run R from within, etc.)
  • If I run snakemake --use-conda -j2, the only error I get is (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) below the shell commands run. If I add --verbose, a lengthy traceback is printed in blue and red, which I've included below. The relevant bit seems to be:
      File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
        return self.info["conda_prefix"]
    AttributeError: 'Conda' object has no attribute 'info'
    
  • Suspecting some kind of race condition, I also tried adding --max-jobs-per-second=0.5 to throttle the jobs so they wouldn't start at the same time, but that appeared to have no effect (jobs start at same time, same error as before. I am not running snakemake with --cluster or --profile or anything; there are no extra slurm jobs being created, just processes spawned on the same compute node)
  • The same problem occurs if I create two totally different Snakemake rules that end up being executed at once, as long as both rules use the same conda environment.

I'm pretty new to both snakemake and to HPC, but seems like this is somewhere between a system-/configuration-specific problem (since it only occurs on the cluster) and a minor snakemake bug (since snakemake seems to be attributing the problem to my shell script rather than something to do with conda). I'm interested in suggestions for how to troubleshoot further or to work around the problem.

Thanks!

Minimal example:

├── input.txt
├── results
└── workflow
    ├── Snakefile
    └── envs
        └── env1.yaml
  • workflow/Snakefile:

    rule all:
        input:
            'results/output1.txt',
            'results/output2.txt',
            'results/output3.txt',
            'results/output4.txt'
    
    rule rule1:
        input: 'input.txt'
        output:
            'results/output{n}.txt'
        conda: 'envs/env1.yaml'
        shell:"""
        sleep 5s
        touch {output}
        """
    
  • workflow/envs/env1.yaml:

    channels:
    - conda-forge
    - bioconda
    - defaults
    dependencies:
    - r-ggplot2
    
$ snakemake --use-conda -j2 -p --verbose
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    4   rule1
    5

<< snip >>


[Fri Mar  5 21:01:33 2021]
Error in rule rule1:
    jobid: 2
    output: results/output2.txt
    conda-env: /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca
    shell:
        
    sleep 5s
    touch results/output2.txt
    
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Full Traceback (most recent call last):
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
    run(
  File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 33, in __rule_rule1
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/shell.py", line 141, in __new__
    cmd = Conda(container_img).shellcmd(conda_env, cmd)
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
    activate = os.path.join(self.bin_path(), "activate")
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
    return os.path.join(self.prefix_path(), "bin")
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
    return self.info["conda_prefix"]
AttributeError: 'Conda' object has no attribute 'info'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
    raise ex
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
    run_func(*args)
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
    raise RuleException(
snakemake.exceptions.RuleException: AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
  File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path

RuleException:
AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
  File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper

Solution

  • Try updating to at least Snakemake v6.0.2. The problem appears to be a bug with the v6.0.0 release and is patched with version v6.0.2 (Release Notes). You're right on the money with it being a race condition issue (see commit).