Search code examples
gpuslurm

How to add nvidia gpu on slurm.conf and gres.conf


I would like to add a first NVIDIA GPU in the slurm cluster and reading I saw that I have to add some info in the slurm.conf, create a gres.conf and more or less this ...

Do you have an example of both files, the part related to the gpu? I'm creating something like this but I do not have idea if it is correct or similar

slurm.conf

GresTypes=gpu
Gres=gpu:nvidia:1

gres.conf

NodeName=worker1 Name=gpu File=/dev/nvidia0

What is "File=/dev/nvidia0"

Thank you


Solution

  • It is a good start, make sure that

    • the Gres=gpu:nvidia:1 is part of a node definition in slurm.conf, i.e. on a line starting with NodeName
    • in gres.conf you report the type as well: NodeName=worker1 Name=gpu Type=nvidia File=/dev/nvidia0
    • you add gpu to the list configured as AccountingStorageTRES so that the accounting tracks GPU usage as well.

    The /dev/nvidia0 file is the special file through which programs and libraries that use the GPU can communicate with it. It is created on boot, or manually by running the nvidia-smi command for instance. See this nVIDIA documentation for details.