I am running a code that apparently requires NVIDIA apex (I initially didn't know and installed the wrong apex). I am unsure how to fix the final error:
(proxy) [jalal@goku proxynca_pp]$ CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset cub --config config/cub.json --mode train --apex --seed 0
(1024, 4096)
train.py:12: MatplotlibDeprecationWarning: The 'warn' parameter of use() is deprecated since Matplotlib 3.1 and will be removed in 3.3. If any parameter follows 'warn', they should be pass as keyword, not positionally.
matplotlib.use('agg', warn=False, force=True)
Traceback (most recent call last):
File "train.py", line 70, in <module>
from apex import amp
File "/scratch3/venv/proxy/lib/python3.8/site-packages/apex/__init__.py", line 13, in <module>
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
After I got the above error, I tried this answer: https://stackoverflow.com/a/67188946/2414957
(proxy) [jalal@goku proxynca_pp]$ pip uninstall apex
Found existing installation: apex 0.9.10.dev0
Uninstalling apex-0.9.10.dev0:
Would remove:
/scratch3/venv/proxy/lib/python3.8/site-packages/apex-0.9.10.dev0-py3.8.egg-info
/scratch3/venv/proxy/lib/python3.8/site-packages/apex/*
Proceed (Y/n)? y
Successfully uninstalled apex-0.9.10.dev0
(proxy) [jalal@goku proxynca_pp]$ git clone https://github.com/NVIDIA/apex
Cloning into 'apex'...
remote: Enumerating objects: 8256, done.
remote: Counting objects: 100% (343/343), done.
remote: Compressing objects: 100% (192/192), done.
remote: Total 8256 (delta 204), reused 240 (delta 139), pack-reused 7913
Receiving objects: 100% (8256/8256), 14.20 MiB | 0 bytes/s, done.
Resolving deltas: 100% (5605/5605), done.
(proxy) [jalal@goku proxynca_pp]$ cd apex
(proxy) [jalal@goku apex]$ pip install -v --disable-pip-version-check --no-cache-dir \
> --global-option="--cpp_ext" --global-option="--cuda_ext" ./
/scratch3/venv/proxy/lib/python3.8/site-packages/pip/_internal/commands/install.py:229: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
cmdoptions.check_install_build_global(options)
Using pip 21.2.4 from /scratch3/venv/proxy/lib/python3.8/site-packages/pip (python 3.8)
Processing /scratch3/research/code/fashion/proxynca_pp/apex
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
Running command python setup.py egg_info
torch.__version__ = 1.9.0+cu111
running egg_info
creating /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info
writing /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/PKG-INFO
writing dependency_links to /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/dependency_links.txt
writing top-level names to /scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/top_level.txt
writing manifest file '/scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/SOURCES.txt'
reading manifest file '/scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/SOURCES.txt'
writing manifest file '/scratch/tmp/pip-pip-egg-info-yc32vm37/apex.egg-info/SOURCES.txt'
/scratch/tmp/pip-req-build-fg_khhkt/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
Skipping wheel build for apex, due to binaries being disabled for it.
Installing collected packages: apex
Running command /scratch3/venv/proxy/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"'; __file__='"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /scratch/tmp/pip-record-u812zb2v/install-record.txt --single-version-externally-managed --compile --install-headers /scratch3/venv/proxy/include/site/python3.8/apex
torch.__version__ = 1.9.0+cu111
/scratch/tmp/pip-req-build-fg_khhkt/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
from /usr/local/cuda-10.0/bin
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/scratch/tmp/pip-req-build-fg_khhkt/setup.py", line 159, in <module>
check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)
File "/scratch/tmp/pip-req-build-fg_khhkt/setup.py", line 99, in check_cuda_torch_binary_vs_bare_metal
raise RuntimeError("Cuda extensions are being compiled with a version of Cuda that does " +
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 11.1.
In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk).
Running setup.py install for apex ... error
ERROR: Command errored out with exit status 1: /scratch3/venv/proxy/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"'; __file__='"'"'/scratch/tmp/pip-req-build-fg_khhkt/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /scratch/tmp/pip-record-u812zb2v/install-record.txt --single-version-externally-managed --compile --install-headers /scratch3/venv/proxy/include/site/python3.8/apex Check the logs for full command output.
I have these packages installed:
(proxy) [jalal@goku apex]$ pip freeze
anykeystore==0.2
certifi==2021.5.30
charset-normalizer==2.0.4
cryptacular==1.6.2
cycler==0.10.0
defusedxml==0.7.1
greenlet==1.1.1
h5py==3.4.0
hupper==1.10.3
idna==3.2
joblib==1.0.1
kiwisolver==1.3.2
MarkupSafe==2.0.1
matplotlib==3.2.0
numpy==1.21.2
oauthlib==3.1.1
PasteDeploy==2.1.1
pbkdf2==1.3
Pillow==8.3.2
plaster==1.0
plaster-pastedeploy==0.7
pyparsing==2.4.7
pyramid==2.0
pyramid-mailer==0.15.1
python-dateutil==2.8.2
python3-openid==3.2.0
repoze.sendmail==4.4.1
requests==2.26.0
requests-oauthlib==1.3.0
scikit-learn==0.24.2
scipy==1.7.1
six==1.16.0
sklearn==0.0
SQLAlchemy==1.4.23
threadpoolctl==2.2.0
torch==1.9.0+cu111
torchaudio==0.9.0
torchvision==0.10.0+cu111
tqdm==4.62.2
transaction==3.0.1
translationstring==1.4
typing-extensions==3.10.0.2
urllib3==1.26.6
velruse==1.1.1
venusian==3.0.0
WebOb==1.8.7
WTForms==2.3.3
wtforms-recaptcha==0.3.2
zope.deprecation==4.4.0
zope.interface==5.4.0
zope.sqlalchemy==1.6
And here's the code is from this GitHub repo.
Edit: I found the steps through a stackoverflow answer that I can't find now (linked above). I don't know how to find the proper link or installation that is compatible with PyTorch 1.9.
FYI, the git repo has no installation instruction hence I am installing things blindly.
Installing CUDA 11.1
and then adding the following to ~/.bashrc
and sourcing the ~/.bashrc
and finally the symlink made it work:
export CUDA_HOME=/usr/local/cuda-11.1
export PATH=/usr/local/cuda-11.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH
This eliminates the need to uninstall CUDA 10.2 especially if needed later for other project. Simply exporting the path and not using the symlink didn't work. $ sudo ln -sfT /usr/local/cuda/cuda-11.1/ /usr/local/cuda
^ Last command is assuming you have multiple CUDA versions installed in your machine. For further information please read this GitHub issue.