Search code examples
bioinformaticscondasnakemake

Snakemake conda env parameter is not taken from config.yaml file


I use a conda env that I create manually, not automatically using Snakemake. I do this to keep tighter version control.

Anyway, in my config.yaml I have the following line:

conda_env: '/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake'

Then, at the start of my Snakefile I read that variable (reading variables from config in your shell part does not seem to work, am I right?):

conda_env = config['conda_env']

Then in a shell part I hail said parameter like this:

rule rsem_quantify:
    input:
        os.path.join(fastq_dir, '{sample}_R1_001.fastq.gz'),
        os.path.join(fastq_dir, '{sample}_R2_001.fastq.gz')
    output:
        os.path.join(analyzed_dir, '{sample}.genes.results'),
        os.path.join(analyzed_dir, '{sample}.STAR.genome.bam')
    threads: 8
    shell:
        '''
        #!/bin/bash
        source activate {conda_env}

        rsem-calculate-expression \
        --paired-end \
        {input} \
        {rsem_ref_base} \
        {analyzed_dir}/{wildcards.sample} \
        --strandedness reverse \
        --num-threads {threads} \
        --star \
        --star-gzipped-read-file \
        --star-output-genome-bam
        '''

Notice the {conda_env}. Now this gives me the following error:

Could not find conda environment: None
You can list all discoverable environments with `conda info --envs`.

Now, if I change {conda_env} for its parameter directly /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake, it does work! I don't have any trouble reading other parameters using this method (like rsem_ref_base and analyzed_dir in the example rule above.

What could be wrong here?

Highest regards,

Freek.


Solution

  • The pattern I use is to load variables into params, so something along the lines of

    rule rsem_quantify:
        input:
            os.path.join(fastq_dir, '{sample}_R1_001.fastq.gz'),
            os.path.join(fastq_dir, '{sample}_R2_001.fastq.gz')
        output:
            os.path.join(analyzed_dir, '{sample}.genes.results'),
            os.path.join(analyzed_dir, '{sample}.STAR.genome.bam')
        params:
            conda_env=config['conda_env']
        threads: 8
        shell:
            '''
            #!/bin/bash
            source activate {params.conda_env}
    
            rsem-calculate-expression \
    
            ...
    
            '''
    

    Although, I'd also never do this with a conda environment, because Snakemake has conda environment management built-in. See this section in the docs on Integrated Package Management for details. This makes reproducibility much more manageable.