Last night, I successfully logged into a Slurm environment. Today, I can't connect to the same environment. The sinfo
command is currently giving me the following:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
p1 up 15-00:00:0 1 idle n1
p0* up 15-00:00:0 1 down* n0
What does the status down*
mean?
sinfo -R
gives
REASON USER TIMESTAMP NODELIST
Not responding slurm 2023-07-08T03:37:34 n0
Bard says "down*" means the node is not responding to ping requests.
This makes sense, but I can't find documentation on this, and I'm a Slurm noob, so I wanted to confirm.
Bard is correct. See the first item in section https://slurm.schedmd.com/sinfo.html#SECTION_NODE-STATE-CODES of the sinfo
manpage