Search code examples
sshslurm

Confused about SLURM: I SSH to a compute node with a private key, so how SLURM is able to access a compute node if I just add a name to slurm.conf?


I'm curious about understanding how slurmctld access to the compute nodes or send info to them. I'm setting up SLURM so is not fully functional yet.

Currently I SSH to the compute node with a private key, i.e, ssh -i mykey.pem compute-node01. To set SLURM, I just added the compute node name to slurm.conf (via slurmd -C). Then, I copied the munge.key and the slurm.conf to all nodes so they are the same. Currently, it is not working. I get munge credential not recognized. I wonder if it is because everytime I access to a node I must type ssh -i mykey.pem compute-node0X, i.e., I must use a private key to access to each node...

I have the following questions:

  • how does SLURM get access to the other nodes? I never registered any IP anywhere, I just added the node name using slurmd -C to slurm.conf, which according to me, doesn't say anything relevant to have a real connection. Is it because they share the munge.key and within this key there is some sort of ssh -i privatekey IP connection?
  • Is my SSH access with a key blocking SLURM and that's why I get credential not recognized?

Thanks


Solution

  • Slurm does not use SSH to communicate. Once the munge daemon and the slurmd daemon are up and running, the slurmd daemons communicate with the slurmctld daemon through Slurm-specific ports provided that

    • they share the same munge key
    • the server running slurmd is registered in the slurm.conf of the server running slurmctld.
    • the firewall does not block communications on slurm port (6818)

    The mykey.pem SSH key is related to your user account, not to Slurm.