Search code examples
pythonnumpyscikit-learnpippython-3.9

Installing old version of scikit-learn: ModuleNotFoundError: No module named 'numpy'


I have an old Python project that uses scikit-learn version 0.22.2.post1. Unfortunately I am unable to update to a newer version of scikit-learn as the training data has long been lost, and I understand that the version of scikit-learn is tied to the model (stored as a .pkl file).

The project uses Python 3.8 and works fine with this version, but I am trying to upgrade it to use Python 3.9.19. I have managed to do this in my local dev environment, but when I try to do so in my Azure Devops pipeline, I get the following error after the command pip install --target="./.python_packages/lib/site-packages" -r ./requirements.txt is run:

Building wheels for collected packages: scikit-learn
  Building wheel for scikit-learn (pyproject.toml): started
  Building wheel for scikit-learn (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  × Building wheel for scikit-learn (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [32 lines of output]
      <string>:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
      Partial import of sklearn during the build process.
      Traceback (most recent call last):
        File "<string>", line 195, in check_package_status
        File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/importlib/__init__.py", line 127, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
        File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
        File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
      ModuleNotFoundError: No module named 'numpy'

The fact that I can do the upgrade locally (same OS, same version of Python, same version of PIP) gives me hope that this is a fixable problem. When I run the same command locally, PIP outputs:

Installing collected packages: azure-functions, numpy, cython, pandas, nltk, flask, xgboost, scikit-learn, spacy
Successfully installed azure-functions-1.20.0 cython-0.29.36 flask-3.0.3 nltk-3.9.1 numpy-1.19.5 pandas-1.4.4 scikit-learn-0.22.2.post1 spacy-3.7.6 xgboost-1.1.1

So the big difference is that the pipeline attempts to build a wheel, while locally it does not. Perhaps I can work around this problem by getting the pipeline to not build a wheel? I have tried using --no-cache-dir or --no-binary="scikit-learn" when calling pip but unfortunately it still attempts to build a wheel and therefore still fails. I have also tried doing pip install numpy==1.19.5 immediately before the existing call to pip, in the hope that scikit-learn would then find numpy, but I still get the same error. I've also tried to install numpy and scikit-learn together in a separate call to pip (pip install --no-cache-dir --no-binary="scikit-learn" --target="./.python_packages/lib/site-packages" numpy==1.19.5 scikit-learn==0.22.2.post1) but again, same error. In case it matters, my requirements.txt file looks like this:

numpy<1.20.0
azure-functions
cython==0.29.36
scikit-learn==0.22.2.post1
pandas>=0.25.1
spacy==3.7.6
(a few other libraries that I don't think are relevant to the problem)

Is there any way to force pip to not build a wheel, or otherwise to fix this error where scikit-learn can't find numpy?


Solution

  • I think I've found a solution. Similar to the answer https://stackoverflow.com/a/66603865/68846, I was able to work around the error by running python -m pip install --upgrade pip setuptools wheel scipy==1.10.1 cython==0.29.36 numpy==1.19.5 before proceeding with pip install -r requirements.txt as before.