I am new to Hadoop ecosystem.
I recently tried Hadoop (2.7.1) on a single-node Cluster without any problems and decided to move on to a Multi-node cluster having 1 namenode and 2 datanodes.
However I am facing a weird issue. Whatever Jobs that I try to run, are stuck with the following message:
on the web interface:
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register
and in the cli:
16/01/05 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001
They don't even start and at this point I am not sure what changes I need to make in order to make it work.
Here's what I have tried to resolve:
I would really appreciate any help (even a minute hint) in correct direction.
I have followed these instructions (configuration):
I finally got this solved. Posting detailed steps for future reference. (only for test environment)
Hadoop (2.7.1) Multi-Node cluster configuration
execute these commands in a new terminal
[on all machines] ↴
stop-dfs.sh;stop-yarn.sh;jps
rm -rf /tmp/hadoop-$USER
[on Namenode/master only] ↴
rm -rf ~/hadoop_store/hdfs/datanode
[on Datanodes/slaves only] ↴
rm -rf ~/hadoop_store/hdfs/namenode
[on all machines] Add IP addresses and corresponding Host names for all nodes in the cluster.
sudo nano /etc/hosts
hosts
xxx.xxx.xxx.xxx master
xxx.xxx.xxx.xxy slave1
xxx.xxx.xxx.xxz slave2
# Additionally you may need to remove lines like "xxx.xxx.xxx.xxx localhost", "xxx.xxx.xxx.xxy localhost", "xxx.xxx.xxx.xxz localhost" etc if they exist.
# However it's okay keep lines like "127.0.0.1 localhost" and others.
[on all machines] Configure iptables
Allow default or custom ports that you plan to use for various Hadoop daemons through the firewall
OR
much easier, disable iptables
on RedHat like distros (Fedora, CentOS)
sudo systemctl disable firewalld
sudo systemctl stop firewalld
on Debian like distros (Ubuntu)
sudo ufw disable
[on Namenode/master only] Gain ssh access from Namenode (master) to all Datnodes (slaves).
ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave2
confirm things by running ping slave1
, ssh slave1
, ping slave2
, ssh slave2
etc. You should have a proper response. (Remember to exit each of your ssh sessions by typing exit
or closing the terminal. To be on the safer side I also made sure that all nodes were able to access each other and not just the Namenode/master.)
[on all machines] edit core-site.xml file
nano /usr/local/hadoop/etc/hadoop/core-site.xml
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>master:9000</value>
<description>NameNode URI</description>
</property>
</configuration>
[on all machines] edit yarn-site.xml file
nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
[on all machines] modify slaves file, remove the text "localhost" and add slave hostnames
nano /usr/local/hadoop/etc/hadoop/slaves
slaves
slave1
slave2
(I guess having this only on Namenode/master will also work but I did this on all machines anyway. Also note that in this configuration master behaves only as resource manger, this is how I intent it to be.)
dfs.replication
to something > 1 (at-least to the number of slaves in the cluster; here I have two slaves so I would set it to 2)[on Namenode/master only] (re)format the HDFS through namenode
hdfs namenode -format
dfs.datanode.data.dir
property from master's hdfs-site.xml file.dfs.namenode.name.dir
property from all slave's hdfs-site.xml file.TESTING (execute only on Namenode/master)
start-dfs.sh;start-yarn.sh
echo "hello world hello Hello" > ~/Downloads/test.txt
hadoop fs -mkdir /input
hadoop fs -put ~/Downloads/test.txt /input
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output
wait for a few seconds and the mapper and reducer should begin.
These links helped me with the issue: