I have a problem with slurm partition. In order to be able to manage my users, I created the following 4 types of partitions:
PartitionName=small State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=2 DefMemPerNode=32000 MaxMemPerNode=32000 MaxCPUsPerNode=16 Default=YES
PartitionName=medium State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=2 DefMemPerNode=64000 MaxMemPerNode=64000 MaxCPUsPerNode=32
PartitionName=large State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=3 DefMemPerNode=128000 MaxMemPerNode=128000 MaxCPUsPerNode=64
PartitionName=entire State=UP Nodes=ALL MaxTime=INFINITE MaxNodes=INFINITE MaxMemPerNode=INFINITE MaxCPUsPerNode=INFINITE
I would also like to be able to add a graphics card limit to each partition.
What is the best way to do this?
Now I want to say that the users of each partition have a graphics card limit as follows:
small and medium partitions: 1 GPU
partition size: 2 GPUs
partition entire: no limitation
At first, I thought it was easy to use Gres like running a job, but there is no such key at all. I saw all the documents and did not find a key similar to what I want. In fact, the reason for this is that the users in each partition have unlimited graphics cards, and the users of the small partition occupy all the graphics cards, and I want to prevent this.
That could be done by creating Partition QOSes for each partition and setting a limit on GrpTRES=gres/gpu=2
for instance.