Search code examples
hadoopvirtualboxhadoop2resourcemanagernamenode

How does master node start all the process in a hadoop cluster?


I have set up a Hadoop cluster of 5 virtual machines , using plain vanilla Hadoop. The cluster details are below:

192.168.1.100 - Configured to Run NameNode and SNN daemons
192.168.1.101 - Configured to Run ResourceManager daemon.
192.168.1.102 - Configured to Run DataNode and NodeManager daemons.
192.168.1.103 - Configured to Run DataNode and NodeManager daemons.
192.168.1.104 - Configured to Run DataNode and NodeManager daemons.

I have kept masters and slaves files in each virtual servers.

masters:

192.168.1.100
192.168.1.101

slaves file:

192.168.1.102
192.168.1.103
192.168.1.104

Now when I run start-all.sh command from NameNode machine, how is it able to start all the daemons? I am not able to understand it? There are no adapters installed (or I am not aware of), there are simple hadoop jars present in all the machines so how is NameNode machine able to start all the daemons in all the machines (virtual servers).

Can anyone help me understand this?


Solution

  • The namenode connects to the slaves via SSH and runs the slave services. That is why you need public ssh-keys in ~/.ssh/authorized_keys on the slaves, to have their private counterparts be present for the user running the Hadoop namenode.