Search code examples
partitionhpcslurmsbatch

Is there a way to set certain nodes within a SLURM partition to be preferred over other nodes?


I have a cluster that consists of mostly CPU+GPU nodes with a couple of CPU only nodes. At the moment they are in two partitions, 'gpuNodes' and 'cpuNodes', respectively. Our needs are growing and our CPU only jobs are needing to use the CPU+GPU nodes in addition to the CPU only nodes to complete in a timely fashion. I was thinking of creating an 'all' partition that has the nodes from both of the previous nodes. Ideally, I'd like to fill out the CPU only nodes before going onto submit jobs to the CPU+GPU nodes.

This leads me to my question. Is there a way to set a priority/preference for a set of nodes within a partition so that a batch job assigned to the partition fills out the preferred nodes first? Or, if you know of a better way to accomplish my goals, I'm not set on the 'all' partition mentioned above.

If it helps name schema for my nodes follow the syntax below:

Nodes with CPUs + GPUs: gn001-gn100
Nodes with CPUs only: n001-n20

Thank you in advance for your help!


Solution

  • This is typically done with the weight parameter in slurm.conf

    From the slurm.conf man page:

    All things being equal, jobs will be allocated the nodes with the lowest weight which satisfies their requirements.

    In your case you would set something like

    NodeName=n[001-200] ... Weight=10
    NodeName=gn[001-100] ... Weight=100
    

    Jobs that do not request GPUs will be allocated the CPU-only nodes first, and only if there a no CPU-only node, will they be allocated CPU+GPU nodes. Jobs that request GPUs will of course only be allocated CPU+GPU nodes.