Search code examples
cgroups

What are cgroups and how are people using them for cluster administration?


Are there examples of how people are using cgroups to better manage research computing clusters that runs parallel scientific codes and serial codes for an academic community?


Solution

  • The primary example I'm aware of is to be able set the cluster scheduler (e.g. Slurm) to assign multiple jobs to a single node without worrying about a renegade job utilizing more resources than assigned.

    Cgroups is the mechanism so that the different jobs are only able to use the resources assigned to them by Slurm.

    Prior to having cluster schedulers capable of doing this many HPC Centers only allowed either one job per node or one user per node. Otherwise a job that requested only 1 core, for example, could, once running, actually use all the cores in the node which would cause other jobs on the node to run poorly.