Search code examples
pythondockertensorflowpycharmnvidia

Pycharm debugging using docker with GPUs


The Goal:

To debug a Python application in PyCharm, where I set the interpreter to a custom docker image, using Tensorflow and so requiring a GPU. The problem is that PyCharm's command-building doesn't offer a way to discover available GPUs, as far as I can tell.

Terminal - it works:

Enter a container with the following command, specifying which GPUs to make available (--gpus):

docker run -it --rm --gpus=all --entrypoint="/bin/bash" 3b6d609a5189        # image has an entrypoint, so I overwrite it

Inside the container, I can run nvidia-smi to see a GPU is found, and confirm Tensorflow finds it, using:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()
# physical_device_desc: "device: 0, name: Quadro P2000, pci bus id: 0000:01:00.0, compute capability: 6.1"]

If I don't use the --gpus flag, no GPUs are discovered, as expected. Note: using docker version 19.03 and above, Nvidia runtimes are supports natively, so there is no need for nvidia-docker and also, the docker-run argument --runtime=nvidia is also deprecated. Relevant thread.

PyCharm - it doesn't work

Here is the configuration for the run:

configuration

(I realise some of those paths might look incorrect, but that isn't an issue for now)

I set the interpreter to point to the same docker image and run the Python script, set a custom LD_LIBRARY_PATH as an argument to the run that matches where the libcuda.so is located in the docker image (I found it interactively inside a running container), but still no device is found:

error message

The error message shows the the CUDA library was able to be loaded (i.e. is was found on that LD_LIBRARY_PATH), but the device was still not found. This is why I believe the docker run argument --gpus=all must be set somewhere. I can't find a way to do that in PyCharm.

Other things I have tried:

  1. In PyCharm, using a Docker execution template config (instead of a Python template) where it is possible to specify run arguments, so I hoped to pass --gpus=all, but that seems not to be supported by the parser of those options:

parse error

  1. I tried to set the default runtime to be nvidia in the docker daemon by including the following config in /etc/docker/daemon.json:
{
    "runtimes": {
        "nvidia": {
            "runtimeArgs": ["gpus=all"]
        }
    }
}

I am not sure of the correct format for this, however. I have tried a few variants of the above, but nothing got the GPUs recognised. The example above could at least be parsed and allow me to restart the docker daemon without errors.

  1. I noticed in the official Tensorflow docker images, they install a package (via apt install) called nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0, which sounds like a great tool, albeit seemingly just for TensorRT. I added it to my Dockerfile as a shot in the dark, but unfortunately it did not fix the issue.

  2. Adding NVIDIA_VISIBLE_DEVICES=all etc. to the environment variables of the PyCharm configuration, with no luck.

I am using Python 3.6, PyCharm Professional 2019.3 and Docker 19.03.


Solution

  • It turns out that attempt 2. in the "Other things I tried" section of my post was the right direction, and using the following allowed PyCharm's remote interpreter (the docker image) locate the GPU, as the Terminal was able to.

    I added the following into /etc/docker/daemon.json:

    {
        "default-runtime": "nvidia",
        "runtimes": {
            "nvidia": {
                "path": "nvidia-container-runtime",
                "runtimeArgs": []
            }
        }
    }
    

    It is also necessary to restart the docker service after saving the file:

    sudo service docker restart
    

    Note: that kills all running docker containers on the system