Search code examples
anacondaconda

Conda environment yaml has more dependencies than needed


conda==4.12.0

I'm trying to generate a clean conda environment yaml of my project and I noticed that there were some dependencies ending up in this file that maybe shouldn't be. The script to generate the environment yaml looks like this:

# Remove environment if it exists
yes | conda remove --name py37_clean --all

# create new environment
yes | conda create -n py37_clean python=3.7 pip
conda activate py37_clean
pip install -r ../requirements.txt

# create yaml file
conda env export > ../environment.yml

# exit environment and delete
conda deactivate
yes | conda remove --name py37_clean --all

When it generates environment.yml, I noticed that there was at least one dependency listed that didn't seem necessary, - pytorch=1.3.1=cpu_py37h0c87eb2_0 and I became concerned that it was defining other dependencies that may not be needed. It's possible this is required by some other package, but I tried to test this with mamba repoquery depends -t pytorch and it couldn't find anything. This command did work for other packages I tested so I think it is working. Any thoughts on why pytorch (and possibly other dependencies) were added? I think it is related to this note in the earlier link

This file handles both the environment's pip packages and conda packages.

But I'm not sure how to limit the conda packages to only what is necessary. I think I'm working with miniconda since which conda shows /Users/[me]/miniconda3/condabin/conda

My requirements.txt look like this:

boto3==1.26.117
matplotlib==3.3.4
numpy==1.15  # required by pyspark
pandas==1.0.5  # required by pyspark
pre-commit==2.21.0
praw==7.7.0
pg8000==1.29.4  # this was easier to pip install than psycopg2
pyarrow==2.0.0
pyspark==3.3.0  # using this version because py37 deprecated in pyspark 3.4.0
scikit-learn==1.0.2
seaborn==0.11.2
shap==0.41.0
sqlalchemy==1.4.46  # originally tried 2.0.10, but this was incompatible with old versions of pandas https://stackoverflow.com/a/75282604/5034651

Solution

  • I figured it out:

    The problem was I don't think the conda environment was even being activated by the above script. I was encountering the error here and the solution was to add source ~/miniconda3/bin/activate at the top of my script. I did not need conda init bash like some people suggested.

    After this, I was able to build the environment I wanted with minimal dependencies.