Search code examples
hpccondor

Condor central manager could not see the other computing nodes


I connect three servers to form an HPC cluster using condor as a middleware when I run the command condor_status from the central manager it does not shows the other nodes I can run jobs in the central manager and connect to the other nodes via SSH but it seems that there is something missing in condor configuration files where I set the central manager as condor host and allows writing and reading for everyone. I keep the daemon MASTER, STARTD in the daemon list for the worker nodes.

When I run condor_status in the central manager it just show the central manager and when I run it on the compute node it give me the error "CEDAR:6001:Failed to connect to" followed by the central manager IP and port number.


Solution

  • I manage to solve it. The problem was in the central manager's firewall (in my case it was iptables) which was running. So, when I stopped the firewall (su -c "service iptables stop") all nodes appeared normally, typing condor_status".

    The firewall status can be checked using "service iptables status".