Search code examples
hadoop2

Checkpointing in a Name Node High Availability un-configured setup


In a setup where Name Node High Availability is not configured, how does the secondary name node handle the checkpointing operations.


Solution

  • If the High Availability is not configured then the Secondary Namenode is there by default as it was in Hadoop 1.

    If you're not aware of Hadoop 1 concept of Secondary Namenode and checkpointing then I can give you a short description but you may want to refer Apache Docs

    Checkpointing concept says :

    There will be an edit log generated after few seconds containing all the changes done in HDFS(like: file permissions, file name, ACL permissions, replication factor etc) but these changes are temporarily stored in edit-logs and will be permanently merged in to the fsimage when the checkpointing is done.

    FYI( Chechkpointing is done after every 60 mins).

    The edit-logs and fsimage generated by Namenode will be stored in LFS(Local file system) and one copy of that fsimage will be sent to Secondary Namenode. Now, Why it is called Backup node? Because if in case Namenode goes down or loses it's metadata information then it can contact SNN(Secondary NN) for last saved fsimage and can restore the metadata information.

    It's the basic idea behind NN and Secondary NN