Before this, I was able to connect to the GPU through CUDA runtime version 10.2
. But then I ran into an error when setting up one of my projects.
Using torch 1.10.1+cu102 (NVIDIA GeForce RTX 3080)
UserWarning:
NVIDIA GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
After some readings, it seems that sm_86
is only available for CUDA version 11.0
and above. That's the reason why I upgraded to the latest CUDA version and can't connect to the GPU after this.
I have tried many ways, reinstalling cuda toolkit, PyTorch, torchvision and stuff but nothing works.
CUDA Toolkit I've used:
$ wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run
$ sudo sh cuda_11.6.0_510.39.01_linux.run
PyTorch I've installed (tried both conda and pip):
$ conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
$ pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
These are some basic info:
(base) ubuntu@DESKTOP:~$ python
Python 3.9.5 (default, Jun 4 2021, 12:28:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.1+cu113'
>>> x = torch.rand(6,6)
>>> print(x)
tensor([[0.0228, 0.3868, 0.9742, 0.2234, 0.5682, 0.7747],
[0.2643, 0.3911, 0.3464, 0.5072, 0.4041, 0.4268],
[0.2247, 0.0936, 0.4250, 0.1128, 0.0261, 0.5199],
[0.0224, 0.7463, 0.1391, 0.8092, 0.3742, 0.2054],
[0.3951, 0.4205, 0.6270, 0.4561, 0.4784, 0.5958],
[0.8430, 0.5078, 0.7759, 0.5266, 0.4925, 0.7557]])
>>> torch.cuda.get_arch_list()
[]
>>> torch.cuda.is_available()
False
>>> torch.version.cuda
'11.3'
>>> torch.cuda.device_count()
0
Below are my configurations.
(base) ubuntu@DESKTOP:~$ ls -l /usr/local/ | grep cuda
lrwxrwxrwx 1 root root 21 Jan 24 13:47 cuda -> /usr/local/cuda-11.3/
lrwxrwxrwx 1 root root 25 Jan 17 10:52 cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x 17 root root 4096 Jan 24 13:48 cuda-11.3
drwxr-xr-x 18 root root 4096 Jan 24 10:17 cuda-11.6
ubuntu version:
(base) ubuntu@DESKTOP:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
nvidia-smi
:
(base) ubuntu@DESKTOP:~$ nvidia-smi
Mon Jan 24 17:22:42 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01 Driver Version: 511.23 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
| 0% 26C P8 5W / 320W | 106MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 4009 G /Xorg N/A |
| 0 N/A N/A 4025 G /xfce4-session N/A |
| 0 N/A N/A 4092 G /xfwm4 N/A |
| 0 N/A N/A 25903 G /msedge N/A |
+-----------------------------------------------------------------------------+
nvcc --version
:
(base) ubuntu@DESKTOP:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
I'm answering my own question.
PyTorch pip wheels and Conda binaries ship with the CUDA runtime.
But CUDA does not normally come with NVCC, and requires to install separately from conda-forge/cudatoolkit-dev
, which is very troublesome during the installation.
So, what I did was that I install NVCC from Nvidia CUDA toolkit.
$ wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run
And Conda Pytorch-GPU version
$ conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
As it turns out both installations are not compatible with each other.
Therefore, the step I did to solve this issue:
cuda_11.3
first from Nvidia's official website.$ pip3 install torch==1.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html