Search code examples
hadoophdfshadoop2high-availability

Hadoop 2.0 Name Node, Secondary Node and Checkpoint node for High Availability


After reading Apache Hadoop documentation , there is a small confusion in understanding responsibilities of secondary node & check point node

I am clear on Namenode role and responsibilities:

  • The NameNode stores modifications to the file system as a log appended to a native file system file, edits. When a NameNode starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NameNode merges fsimage and edits files only during start up, the edits log file could get very large over time on a busy cluster. Another side effect of a larger edits file is that next restart of NameNode takes longer.

But I have a small confusion in understanding Secondary namenode & Check point namenode responsibilities.

Secondary NameNode:

  • The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode.

Check point node:

  • The Checkpoint node periodically creates checkpoints of the namespace. It downloads fsimage and edits from the active NameNode, merges them locally, and uploads the new image back to the active NameNode. The Checkpoint node usually runs on a different machine than the NameNode since its memory requirements are on the same order as the NameNode. The Checkpoint node is started by bin/hdfs namenode -checkpoint on the node specified in the configuration file.

It seems that responsibility between Secondary namenode & Checkpoint node are not clear. Both are working on edits. So who will modify finally?

On a different note, I have created two bugs in jira to remove ambiguity in understanding these concepts.

issues.apache.org/jira/browse/HDFS-8913 
issues.apache.org/jira/browse/HDFS-8914 

Solution

  • NameNode(Primary)

    The NameNode stores the metadata of the HDFS. The state of HDFS is stored in a file called fsimage and is the base of the metadata. During the runtime modifications are just written to a log file called edits. On the next start-up of the NameNode the state is read from fsimage, the changes from edits are applied to that and the new state is written back to fsimage. After this edits is cleared and contains is now ready for new log entries.

    Checkpoint Node

    A Checkpoint Node was introduced to solve the drawbacks of the NameNode. The changes are just written to edits and not merged to fsimage during the runtime. If the NameNode runs for a while edits gets huge and the next startup will take even longer because more changes have to be applied to the state to determine the last state of the metadata.

    The Checkpoint Node fetches periodically fsimage and edits from the NameNode and merges them. The resulting state is called checkpoint. After this is uploads the result to the NameNode.

    There was also a similiar type of node called “Secondary Node” but it doesn’t have the “upload to NameNode” feature. So the NameNode need to fetch the state from the Secondary NameNode. It also was confussing because the name suggests that the Secondary NameNode takes the request if the NameNode fails which isn’t the case.

    Backup Node

    The Backup Node provides the same functionality as the Checkpoint Node, but is synchronized with the NameNode. It doesn’t need to fetch the changes periodically because it receives a strem of file system edits. from the NameNode. It holds the current state in-memory and just need to save this to an image file to create a new checkpoint.