I need to run son ML on my laptop, and I need the GPU for some dependencies issues related to a requirements.txt file. However, it turns out that PyTorch (which I need to be an older version, i.e. 1.7.0) cannot find any Cuda device, despite it being actually present and the Cuda toolkit has been installed.
PyTorch was installed through pip. I also tried to install PyTorch1.8.0 which has compatibility with Cuda <=11.1 drivers (the oldest I can install on my WSL), but nothing changed from what happens below.
I have installed NVidia drivers through this link, according to the documentation provided by NVIDIA.
GPU: GeForce RTX 1650Ti
Windows10 version: 21H2
WSL distro: ubuntu 20.04
$ uname -r
5.10.60.1-microsoft-standard-WSL2
(3.7.10/envs/python37cuda) ➜ ~ nvidia-smi
Fri Jan 21 23:11:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00 Driver Version: 510.06 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 48C P8 5W / N/A | 518MiB / 4096MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(3.7.10/envs/python37cuda) ➜ ~ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
(3.7.10/envs/python37cuda) ➜ ~ python
Python 3.7.10 (default, Jan 21 2022, 16:08:33)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>torch.cuda.is_available()
False
Please, note I tried with different versions of Cuda, namely 11.6, 11.1, and nothing changed. Why cannot it see the GPU and Cuda drivers are not available? Running nvidia-smi
in PowerShell, however, it actually recognizes the drivers.
Moreover:
lspci | grep NVIDIA
returns nothing.
In addition, running
docker run --rm --gpus=all nvidia/cuda:11.1-base nvidia-smi
Fri Jan 21 22:24:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00 Driver Version: 510.06 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 49C P8 4W / N/A | 501MiB / 4096MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The docker container can see the GeForce GPU.
Whereas with the command:
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Error: only 0 Devices available, 1 requested. Exiting.
it cannot found anything.
Any hint to how to solve this issue and be able to?
EDIT:
Library and environment paths were both updated with the actual CUDA folder (i.e. in this case 11.1)
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib
Forgot to mention that, when in Powershell, nvidia-smi
actually shows also the CUDA driver version.
EDIT:
Just found out that with nvidia-smi.exe
run in WSL2, it actually displays the CUDA Version, as if I were doing it in Powershell.
Moreover:
➜ ~ ls -la /dev/dxg
crw-rw-rw- 1 root root 10, 63 Jan 21 22:21 /dev/dxg
The tricky thing with WSL is you could have multiple versions of python. Be it the distribution versions, windows version, or anaconda and really many others. So you need to ensure you are using the right version.
If you are using Ubuntu they have recommended steps for setting up CUDA. It is actually quite easy. Check here - https://ubuntu.com/tutorials/enabling-gpu-acceleration-on-ubuntu-on-wsl2-with-the-nvidia-cuda-platform#1-overview
But basically the steps are as follows
sudo apt-key del 7fa2af80
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/3bf863cc.pub
sudo add-apt-repository 'deb https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/ /'
sudo apt-get update
sudo apt-get -y install cuda
Basically you do not want to use the default cuda version provided by your distribution. It needs to match what Windows has installed.
Now you could compile their test application to see if CUDA is working like so.
git clone https://github.com/nvidia/cuda-samples
cd cuda-samples/Samples/1_Utilities/deviceQuery
make
./deviceQuery
Also I should add using pytorch website to download their latest stable version also works. You should go to their website and not copy this as it is probably old depending on when you are seeing this post.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118