I admin in a computer science department where hundreds of users with limited filesystem usage quotas are trying to set up conda environments for various computational tasks. So I've been experimenting with setting up multi-user miniconda environments on shared storage in an attempt to get at least the package installs out of users' home directories.
The simple method described here: https://docs.anaconda.com/free/anaconda/install/multi-user/ won't work for us because we have lots of users that can't necessarily be trusted, so a world or even group-writable conda install is not an option. Users could just cd to the miniconda directory and wreak havoc.
The ideal I was aiming for was to set up a number of frequently implemented environments for using things like numpy and pytorch in /mnt/opt/miniconda/envs, which users could then activate by (for example)
source /mnt/opt/miniconda/bin/activate pytorch
This works, but users are unable to install supplemental packages because /mnt/opt/miniconda/envs/pytorch is read-only. What I thought might be possible is that these supplemental packages would be installed in /home/$USER/.conda/pkgs and associated with the environment for that user only, but this doesn't seem work. However, in testing things, I ran into a rather strange anomaly. If I run
source /mnt/opt/miniconda/bin/activate numpy
conda install scipy
the install fails with what amount to a write permission error. If, however, I set this environment variable in .bashrc first:
export CONDA_PKGS_DIRS="/home/$USER/.conda/pkgs"
and then repeat:
source /mnt/opt/miniconda/bin/activate numpy
conda install scipy
the packages get installed in the correct directory:
(numpy) pgoetz@texas-tea pkgs$ pwd
/home/pgoetz/.conda/pkgs
(numpy) pgoetz@texas-tea pkgs$ ls
cache scipy-1.11.1-py311h08b1b3b_0
libgfortran5-11.2.0-h1234567_1 scipy-1.11.1-py311h08b1b3b_0.conda
libgfortran5-11.2.0-h1234567_1.conda urls
libgfortran-ng-11.2.0-h00389a5_1 urls.txt
libgfortran-ng-11.2.0-h00389a5_1.conda
but they're not accessible in the environment:
(numpy) pgoetz@texas-tea pkgs$ python
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import scipy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'scipy'
>>>
So these are ... orphaned conda packages? Does anyone know what's going on here, or is this just an unanticipated edge case?
This sounds like expected behavior. Conda carries out package installation in two phases1:
If a package cache is writeable - which the user one, ~/.conda/pkgs
, always should be - then the FETCH stage should execute without a hitch. However, if the target environment is read-only, then the LINK phase could fail. Since cached packages are considered immutable, there is no reason to cleanup anything from the successful FETCH, despite the LINK phase failing.
A conda clean -tp
should remove the "orphaned" packages. However, be aware that if softlinks/symlinks are being used, the -p, --packages
part could break environments (the -t, --tarballs
flag is safe).
[1]: There can also be an UNLINK phase between these if any packages get upgraded or removed in the transaction. If the target environment is read-only, UNLINK will fail in the same fashion as LINK.