I have an old Python project that uses scikit-learn version 0.22.2.post1
. Unfortunately I am unable to update to a newer version of scikit-learn as the training data has long been lost, and I understand that the version of scikit-learn is tied to the model (stored as a .pkl file).
The project uses Python 3.8 and works fine with this version, but I am trying to upgrade it to use Python 3.9.19. I have managed to do this in my local dev environment, but when I try to do so in my Azure Devops pipeline, I get the following error after the command pip install --target="./.python_packages/lib/site-packages" -r ./requirements.txt
is run:
Building wheels for collected packages: scikit-learn
Building wheel for scikit-learn (pyproject.toml): started
Building wheel for scikit-learn (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Building wheel for scikit-learn (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [32 lines of output]
<string>:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
Partial import of sklearn during the build process.
Traceback (most recent call last):
File "<string>", line 195, in check_package_status
File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'numpy'
The fact that I can do the upgrade locally (same OS, same version of Python, same version of PIP) gives me hope that this is a fixable problem. When I run the same command locally, PIP outputs:
Installing collected packages: azure-functions, numpy, cython, pandas, nltk, flask, xgboost, scikit-learn, spacy
Successfully installed azure-functions-1.20.0 cython-0.29.36 flask-3.0.3 nltk-3.9.1 numpy-1.19.5 pandas-1.4.4 scikit-learn-0.22.2.post1 spacy-3.7.6 xgboost-1.1.1
So the big difference is that the pipeline attempts to build a wheel, while locally it does not. Perhaps I can work around this problem by getting the pipeline to not build a wheel? I have tried using --no-cache-dir
or --no-binary="scikit-learn"
when calling pip but unfortunately it still attempts to build a wheel and therefore still fails. I have also tried doing pip install numpy==1.19.5
immediately before the existing call to pip, in the hope that scikit-learn would then find numpy, but I still get the same error. I've also tried to install numpy and scikit-learn together in a separate call to pip (pip install --no-cache-dir --no-binary="scikit-learn" --target="./.python_packages/lib/site-packages" numpy==1.19.5 scikit-learn==0.22.2.post1
) but again, same error. In case it matters, my requirements.txt file looks like this:
numpy<1.20.0
azure-functions
cython==0.29.36
scikit-learn==0.22.2.post1
pandas>=0.25.1
spacy==3.7.6
(a few other libraries that I don't think are relevant to the problem)
Is there any way to force pip to not build a wheel, or otherwise to fix this error where scikit-learn can't find numpy?
I think I've found a solution. Similar to the answer https://stackoverflow.com/a/66603865/68846, I was able to work around the error by running python -m pip install --upgrade pip setuptools wheel scipy==1.10.1 cython==0.29.36 numpy==1.19.5
before proceeding with pip install -r requirements.txt
as before.