I'm working on a RedHat 9.1 system and trying to enable GPU access to podman. (I actually want to run Docker with GPU access, but apparently that isn't supported on RedHat 9.1.)
I followed these instructions and initially received an "insufficient permissions" error when I tried to run the example container, which simply runs nvidia-smi
. (When I run nvidia-smi
from the host, I get the expected output, and I'm able to run GPU-dependent jobs from the host. At least the drivers and libraries appear to be correctly installed for the host.)
This issue and various others describes a fix for that problem in which /etc/nvidia-container-runtime/config.toml
is adjusted such that #no-cgroups = false
is changed to no-cgroups = true
. This removes the insufficient permissions problem, but then I get the GPU-access blocked error.
I found a number of other posts that gave a different fix for the first error in which #user = "root:root"
in the config.toml is uncommented and changed to match the ownership of /dev/nvidia*
- all of those are root:root
on this system and making that change in config.toml did not help.
I found a few suggestions for the blocked by OS error, but they consisted of the same sets of changes to config.toml. Also, I have restarted the podman service after changing config.toml.
Turns out this is an SELinux file context issue.
You can see the file contexts by running ls -lZ /dev | grep nvidia
Running
chcon -t container_file_t /dev/nvidia*
changes the file context of the nvidia devices such that containers are allowed to access them. I also had to include the specific devices on the command line when running podman
sudo podman run --gpus all --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia1:/dev/nvidia1 --device /dev/nvidiactl:/dev/nvidiactl --rm nvcr.io/nvidia/cuda:12.0.1-runtime-ubi8 nvidia-smi
This works, and it fixes the problem for Docker as well.
Note that the chcon -t container_file_t /dev/nvidia*
is temporary and file contexts will revert back when the system is rebooted. To make the changes permanent, run
semanage fcontext -a -t container_file_t '/dev/nvidia*'