Search code examples
javacentos7hadoop2

Hadoop: There are 0 datanodes running and no nodes & cannot connect to namenode


I'm having trouble setting up Hadoop. My setup consists of a nameNode VM and two seperate physical dataNodes that are connected to the same network.

IP configuration:

  • 192.168.118.212 namenode-1
  • 192.168.118.217 datanode-1
  • 192.168.118.216 datanode-2

I keep getting the error that there are 0 datanodes running, but when I do JPS on my dataNode-1 machine or dataNode-2 machine, it shows up as running. My nameNode log shows this:

File /user/hadoop/.bashrc_COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.

The logs on my dataNode-1 machine tell me that it has trouble connecting to the nameNode.

WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: namenode-1/192.168.118.212:9000

Only weird part is that it can't connect, though it can start it? I can also SSH between all of them with no problems.

So my best guess would be that I've configured the one of the config files incorrectly, though I checked other questions on here and they seem to be correct.

core-site.xml

<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://namenode-1:9000/</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/hadoop_data/hdfs/datanode</value>
    <final>true</final>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/hadoop_data/hdfs/namenode</value>
    <final>true</final>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.job.tracker</name>
    <value>namenode-1:9001</value>
</property>
</configuration>

Solution

  • The problem was the firewall. You can stop it by running systemctl stop firewalld.service

    I found the answer here: https://stackoverflow.com/a/37994066/8789361