To debug a Python application in PyCharm, where I set the interpreter to a custom docker image, using Tensorflow and so requiring a GPU. The problem is that PyCharm's command-building doesn't offer a way to discover available GPUs, as far as I can tell.
Enter a container with the following command, specifying which GPUs to make available (--gpus
):
docker run -it --rm --gpus=all --entrypoint="/bin/bash" 3b6d609a5189 # image has an entrypoint, so I overwrite it
Inside the container, I can run nvidia-smi
to see a GPU is found, and confirm Tensorflow finds it, using:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
# physical_device_desc: "device: 0, name: Quadro P2000, pci bus id: 0000:01:00.0, compute capability: 6.1"]
If I don't use the --gpus
flag, no GPUs are discovered, as expected.
Note: using docker version 19.03 and above, Nvidia runtimes are supports natively, so there is no need for nvidia-docker
and also, the docker-run argument --runtime=nvidia
is also deprecated. Relevant thread.
Here is the configuration for the run:
(I realise some of those paths might look incorrect, but that isn't an issue for now)
I set the interpreter to point to the same docker image and run the Python script, set a custom LD_LIBRARY_PATH
as an argument to the run that matches where the libcuda.so
is locate
d in the docker image (I found it interactively inside a running container), but still no device is found:
The error message shows the the CUDA library was able to be loaded (i.e. is was found on that LD_LIBRARY_PATH
), but the device was still not found. This is why I believe the docker run argument --gpus=all
must be set somewhere. I can't find a way to do that in PyCharm.
--gpus=all
, but that seems not to be supported by the parser of those options:nvidia
in the docker daemon by including the following config in /etc/docker/daemon.json
:{ "runtimes": { "nvidia": { "runtimeArgs": ["gpus=all"] } } }
I am not sure of the correct format for this, however. I have tried a few variants of the above, but nothing got the GPUs recognised. The example above could at least be parsed and allow me to restart the docker daemon without errors.
I noticed in the official Tensorflow docker images, they install a package (via apt install
) called nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0
, which sounds like a great tool, albeit seemingly just for TensorRT. I added it to my Dockerfile as a shot in the dark, but unfortunately it did not fix the issue.
Adding NVIDIA_VISIBLE_DEVICES=all
etc. to the environment variables of the PyCharm configuration, with no luck.
I am using Python 3.6, PyCharm Professional 2019.3 and Docker 19.03.
It turns out that attempt 2. in the "Other things I tried" section of my post was the right direction, and using the following allowed PyCharm's remote interpreter (the docker image) locate the GPU, as the Terminal was able to.
I added the following into /etc/docker/daemon.json
:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
It is also necessary to restart the docker service after saving the file:
sudo service docker restart
Note: that kills all running docker containers on the system