I've a Hbase cluster setup with 3 nodes: A NameNode and 2 DataNodes. The NameNode is a server with 4GB memory and 20GB hard disk while each DataNode has 8GB memory and 100GB hard disk.
I'm using Apache Hadoop version: 2.7.2 and Apache Hbase version: 1.2.4
I've seen some people mentioned about a Secondary NameNode.
My questions are,
- What is the impact of not having a Secondary NameNode in my setup?
SecondaryNamenode
does the job of periodically merging the namespace image with the edit log (called as checkpointing). Your setup is not an High-Availability setup, thus not having one will cause the edit log to grow large in size which would eventually add an overhead to the NameNode during startup.
- Is it possible to use one of the DataNodes as the Secondary NameNode?
Running the SNN in a Datanode host is not recommended. A separate host is preferred to run the Secondary Namenode process. The host chosen for SNN must have identical memory as the NN.
- If possible how can I do it? (I inserted only the NameNode in /etc/hadoop/masters file.)
masters
file is not in use anymore. Add this property in hdfs-site.xml
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>SNN_host:50090</value>
</property>
Also note that, SecondaryNamenode process is started by default in the node where start-dfs.sh
is executed.