Search code examples
windowsdockercudanvidiawindows-11

Docker WSL2 CUDA support: stdout: , stderr: Auto-detected mode as 'legacy' error


So I have CUDA 11.8 on windows, nvidia-smi running in windows:

Sat Dec  3 06:44:49 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 527.37       Driver Version: 527.37       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30         TCC   | 00000000:41:00.0 Off |                    0 |
| N/A   48C    P0    39W / 165W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ... WDDM  | 00000000:61:00.0  On |                  N/A |
| 24%   41C    P8    15W / 180W |   1411MiB /  8192MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    1   N/A  N/A      2632    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    1   N/A  N/A      6040    C+G   ...ontend\Docker Desktop.exe    N/A      |
|    1   N/A  N/A      8588    C+G   C:\Windows\explorer.exe         N/A      |
|    1   N/A  N/A     10692    C+G   ...n1h2txyewy\SearchHost.exe    N/A      |
|    1   N/A  N/A     10724    C+G   ...artMenuExperienceHost.exe    N/A      |
|    1   N/A  N/A     13060    C+G   ...perience\NVIDIA Share.exe    N/A      |
|    1   N/A  N/A     14252    C+G   ...418.62\msedgewebview2.exe    N/A      |
|    1   N/A  N/A     14816    C+G   ...ge\Application\msedge.exe    N/A      |
|    1   N/A  N/A     15300    C+G   ...lPanel\SystemSettings.exe    N/A      |
|    1   N/A  N/A     15384    C+G   ...ck\app-4.29.149\slack.exe    N/A      |
|    1   N/A  N/A     16488    C+G   ...418.62\msedgewebview2.exe    N/A      |
|    1   N/A  N/A     18084    C+G   ...me\Application\chrome.exe    N/A      |
+-----------------------------------------------------------------------------+

Somehow it thinks it is CUDA 12... anyway WSL is running as WSL2:

PS C:\Users\olegj> wsl -l -v
  NAME                   STATE           VERSION
* docker-desktop         Running         2
  docker-desktop-data    Running         2

No matter how I run docker with GPU from Windows 11 I get an error:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: nvml error: unknown error: unknown.

Examples that produce this very same error:

 docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu20.04 nvidia-smi
 docker run --env NVIDIA_DISABLE_REQUIRE=1 --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody 342 nbody -gpu -benchmark

My Docker version is 20.10.21, build baeda1f

And my Docker Desktop version is 4.15.0 (93002)

So what to check or how to trouble-shoot such error further on Windows 11?

Steps to reproduce: So having one GPU on a fresh windows install we get all correctly, yet as soon as the second gpu (different from the first one) is installed I get that error, it persists even after cuda, docker, cuda in wsl reinstall


Solution

  • So as commented:

    WSL 2 GPU acceleration will be available on Pascal and later GPU architecture on both GeForce and Quadro product SKUs in WDDM mode. It will not be available on Quadro GPUs in TCC mode or Tesla GPUs yet.
    
    And, for the purposes of this description, it is a Tesla GPU. It is not supported in WSL.
    

    So the solution for me was as follows:

    1. forget about hopes for WSL
    2. install Windows Server with Hyper-V VM with PCI-E pass thrue
    3. Forget about that device on the Windows side
    4. Install docker on Ubuntu inside VM with pass thrue There were some UI rendering problems on Ubuntu yet in WSL2 there would be only terminal anyway so meh.

    It looks like this: enter image description here

    Sad results! Will be hoping for the end of Tesla users discrimination in WSL3=)