Search code examples
dockerdocker-compose

What is the latest proper way to use the Nvidia Container Toolkit with docker compose?


What is the equivalent of this docker command in Docker Compose?

docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi

That command works for me:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:01:00.0 Off |                  N/A |
| 30%   28C    P0             26W /  165W |       1MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

However, with the following docker-compose.yml:

services:
  testing:
    image: ubuntu:latest
    command: nvidia-smi
    environment:
      NVIDIA_VISIBLE_DEVICES: all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

... and running docker-compose up I get the following output:

[+] Running 2/2
 ✔ Network testing_default      Created                                                                             0.1s
 ✔ Container testing-testing-1  Created                                                                             0.1s
Attaching to testing-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

It seems like things have been in flux. The above yaml worked for me with nvidia-docker, but not nvidia-container-toolkit. I did have to specify runtime: nvidia before, but now when I specify that I get Error response from daemon: unknown or invalid runtime name: nvidia.

As a sanity check, the following command:

docker run --rm -it ubuntu:latest nvidia-smi

Gives me the following output:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown.

Solution

  • Thanks to ereslibre on github I have a solution!

    services:
      testing:
        image: ubuntu:latest
        command: nvidia-smi
        deploy:
          resources:
            reservations:
              devices:
                - driver: cdi
                  device_ids:
                    - nvidia.com/gpu=all