Search code examples
slurm

What does the Slurm status down* mean?


Last night, I successfully logged into a Slurm environment. Today, I can't connect to the same environment. The sinfo command is currently giving me the following:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
p1           up 15-00:00:0      1   idle n1
p0*          up 15-00:00:0      1  down* n0

What does the status down* mean?

sinfo -R gives

REASON               USER      TIMESTAMP           NODELIST
Not responding       slurm     2023-07-08T03:37:34 n0

Bard says "down*" means the node is not responding to ping requests.

This makes sense, but I can't find documentation on this, and I'm a Slurm noob, so I wanted to confirm.


Solution

  • Bard is correct. See the first item in section https://slurm.schedmd.com/sinfo.html#SECTION_NODE-STATE-CODES of the sinfo manpage