Search code examples
dockerdocker-composenvidianvidia-docker

OCI runtime error when running Docker Compose with NVIDIA runtime, but not with docker run


I'm new to Stackoverflow and the NVIDIA runtime, and I'm trying to run a Docker container with the NVIDIA runtime using Docker Compose. However, I'm getting an error that I don't get when running the container directly with docker run.

Here's the relevant section of my docker-compose.yml file:

services:
  nvidia-test:
    image: nvidia/cuda:11.5.2-base-ubuntu20.04
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

When I run docker-compose up, I get the following error:

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

However, when I run the container directly with the docker run command, as follows, I don't get this/any error:

sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.5.2-base-ubuntu20.04 nvidia-smi

I'm not sure what could be causing this error. Can someone help me understand the issue and how to resolve it so that I can run the container with the NVIDIA runtime using Docker Compose? Currently I am using docker-compose version v2.16.0, and I installed NVIDIA-Container-Toolkit following this link. Here are the NVIDIA Driver and CUDA version installed on my machine:

GPU Specs

Please let me know if you need additional information from me to better understand the issue.

I already sudo systemctl status nvidia-persistenced to check the Persistence Daemon. But it is active (running).


Solution

  • Adding sudo in front of the docker-compose up solved the problem. I assume that elevated privileges are required to allow Docker to properly access the necessary NVIDIA tools and libraries. Same as for the sudo docker run .... Also, note that the --runtime=nvidia in sudo docker run ... is not needed anymore for newer nvidia-container-toolkit versions.