Search code examples
dockerubuntupytorchwindows-subsystem-for-linux

The host has cuda and GPU installed, but pytorch (WSL2) cannot find it


I need to run son ML on my laptop, and I need the GPU for some dependencies issues related to a requirements.txt file. However, it turns out that PyTorch (which I need to be an older version, i.e. 1.7.0) cannot find any Cuda device, despite it being actually present and the Cuda toolkit has been installed.

PyTorch was installed through pip. I also tried to install PyTorch1.8.0 which has compatibility with Cuda <=11.1 drivers (the oldest I can install on my WSL), but nothing changed from what happens below.

I have installed NVidia drivers through this link, according to the documentation provided by NVIDIA.

GPU: GeForce RTX 1650Ti

Windows10 version: 21H2

WSL distro: ubuntu 20.04

$ uname -r 5.10.60.1-microsoft-standard-WSL2

(3.7.10/envs/python37cuda) ➜  ~ nvidia-smi
Fri Jan 21 23:11:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00       Driver Version: 510.06       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   48C    P8     5W /  N/A |    518MiB /  4096MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(3.7.10/envs/python37cuda) ➜  ~ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

(3.7.10/envs/python37cuda) ➜  ~ python
Python 3.7.10 (default, Jan 21 2022, 16:08:33)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>torch.cuda.is_available()
False

Please, note I tried with different versions of Cuda, namely 11.6, 11.1, and nothing changed. Why cannot it see the GPU and Cuda drivers are not available? Running nvidia-smi in PowerShell, however, it actually recognizes the drivers.

Moreover: lspci | grep NVIDIA returns nothing.

In addition, running docker run --rm --gpus=all nvidia/cuda:11.1-base nvidia-smi

Fri Jan 21 22:24:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00       Driver Version: 510.06       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8     4W /  N/A |    501MiB /  4096MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The docker container can see the GeForce GPU.

Whereas with the command: docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

Error: only 0 Devices available, 1 requested.  Exiting.

it cannot found anything.

Any hint to how to solve this issue and be able to?

EDIT:

Library and environment paths were both updated with the actual CUDA folder (i.e. in this case 11.1)

export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}} 

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib

Forgot to mention that, when in Powershell, nvidia-smi actually shows also the CUDA driver version.

EDIT: Just found out that with nvidia-smi.exe run in WSL2, it actually displays the CUDA Version, as if I were doing it in Powershell. Moreover:

➜  ~ ls -la /dev/dxg
crw-rw-rw- 1 root root 10, 63 Jan 21 22:21 /dev/dxg

Solution

  • The tricky thing with WSL is you could have multiple versions of python. Be it the distribution versions, windows version, or anaconda and really many others. So you need to ensure you are using the right version.

    If you are using Ubuntu they have recommended steps for setting up CUDA. It is actually quite easy. Check here - https://ubuntu.com/tutorials/enabling-gpu-acceleration-on-ubuntu-on-wsl2-with-the-nvidia-cuda-platform#1-overview

    But basically the steps are as follows

    sudo apt-key del 7fa2af80
    wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
    sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/3bf863cc.pub
    sudo add-apt-repository 'deb https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/ /'
    sudo apt-get update
    sudo apt-get -y install cuda
    

    Basically you do not want to use the default cuda version provided by your distribution. It needs to match what Windows has installed.

    Now you could compile their test application to see if CUDA is working like so.

    git clone https://github.com/nvidia/cuda-samples
    cd cuda-samples/Samples/1_Utilities/deviceQuery
    make
    ./deviceQuery
    

    Also I should add using pytorch website to download their latest stable version also works. You should go to their website and not copy this as it is probably old depending on when you are seeing this post. pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118