When writing a program in a jupyter lab notebook using R or python I install specific conda environments as kernels to access that environment-specific packages from a single jupyter lab installation in the base conda environment
After finishing developing the notebook I want to plug it into my snakemake file to ensure later reproducibility and of course, I do that with the respective conda environment .yaml file so that all the needed packages/libraries are provided.
Now comes the problem: the rule which references the notebook, is not reproducible on another machine/environment, as it tries to access/run a kernel which is specific to my development environment
Does anyone have a workaround or solution to this particular problem?
EDIT: More detailed steps leading to my problem
conda create -n matplotlib_env python=3.8 ipykernel matplotlib nbconvert
python -m ipykernel install --user --name matplotlib_env --display-name "Python_maplotlib"
import matplotlib.pyplot as plt
plt.plot([1,2,3],[1,4,9])
plt.savefig('exp.png')
conda env export > matplotlib_env.yaml
).rule make_plot:
output:
"exp.png"
conda:
"matplotlib_env.yaml"
notebook:
"make_plot.ipynb"
snakemake -p --cores 1 --use-conda make_plot
ERROR message after removing the original environment
[NbConvertApp] ERROR | Failed to run command:
...
FileNotFoundError: [Errno 2] No such file or directory: '/home/miniconda3/envs/matplotlib_env/bin/python'
ERROR message after removing kernel from kernel list
...
raise NoSuchKernel(kernel_name)
jupyter_client.kernelspec.NoSuchKernel: No such kernel named matplotlib_env
I found a workaround: Before saving your notebook the last time switch kernel to the default python or R kernel from your base environment. These always have the same default name post-installation (eg "Python 3"). Thereby, when executed via snakemake the notebook looks for the default Python/R kernel and finds the new installation provided via the snakemake --use-conda option.
This is not the most elegant version, but if you need/want to plug notebooks into your workflow, it is good enough.
The most rigorous would be to turn the notebook, once done, into a script with clearly defined input and outputs. This script is then plugged into the snakefile, via a corresponding rule, instead of the notebook.