I would like to add a first NVIDIA GPU in the slurm cluster and reading I saw that I have to add some info in the slurm.conf, create a gres.conf and more or less this ...
Do you have an example of both files, the part related to the gpu? I'm creating something like this but I do not have idea if it is correct or similar
slurm.conf
GresTypes=gpu
Gres=gpu:nvidia:1
gres.conf
NodeName=worker1 Name=gpu File=/dev/nvidia0
What is "File=/dev/nvidia0"
Thank you
It is a good start, make sure that
Gres=gpu:nvidia:1
is part of a node definition in slurm.conf
, i.e. on a line starting with NodeName
gres.conf
you report the type as well: NodeName=worker1 Name=gpu Type=nvidia File=/dev/nvidia0
gpu
to the list configured as AccountingStorageTRES
so that the accounting tracks GPU usage as well.The /dev/nvidia0
file is the special file through which programs and libraries that use the GPU can communicate with it. It is created on boot, or manually by running the nvidia-smi
command for instance. See this nVIDIA documentation for details.