Search code examples
hpc

For SLURM clusters why do we need to specify memory allocation for jobs?


Is it not possible for a computing cluster to just dynamically allocate memory as needed?


Solution

  • It's not so much a problem of allocating memory, but of knowing the shape of the workload to place it optimally (or at least non-problematically) in the cluster. The point is so that jobs can be placed on nodes with sufficient memory to handle the task. This avoids problems that can occur if jobs take up more memory than is available on a node.

    If the scheduler simply let as many jobs as possible run, without respecting how much memory the job needs and how much memory is available on a node, you may exhaust available memory on the node, either causing severe performance issues (paging), hangs, or OOM errors.