Search code examples
gpulimitslurmpartitionhpc

Setting a graphics card limit for slurm partitions


I have a problem with slurm partition. In order to be able to manage my users, I created the following 4 types of partitions:

PartitionName=small State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=2 DefMemPerNode=32000 MaxMemPerNode=32000 MaxCPUsPerNode=16 Default=YES
PartitionName=medium State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=2 DefMemPerNode=64000 MaxMemPerNode=64000 MaxCPUsPerNode=32
PartitionName=large State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=3 DefMemPerNode=128000 MaxMemPerNode=128000 MaxCPUsPerNode=64
PartitionName=entire State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=INFINITE MaxMemPerNode=INFINITE MaxCPUsPerNode=INFINITE

I would also like to be able to add a graphics card limit to each partition.

What is the best way to do this?

Now I want to say that the users of each partition have a graphics card limit as follows:

  • small and medium partitions: 1 GPU

  • partition size: 2 GPUs

  • partition entire: no limitation

At first, I thought it was easy to use Gres like running a job, but there is no such key at all. I saw all the documents and did not find a key similar to what I want. In fact, the reason for this is that the users in each partition have unlimited graphics cards, and the users of the small partition occupy all the graphics cards, and I want to prevent this.


Solution

  • That could be done by creating Partition QOSes for each partition and setting a limit on GrpTRES=gres/gpu=2 for instance.