Can you please help me out to the below scenarios.
1) While using Hadoop V2, do we use Secondary NameNode in production environment?
2) For Hadoop V2, suppose we use muliple NameNodes in active/passive connection for High Availability and when the Edits Log file is growing huge,
How does the edits log gets applied to fsimage? If so, then applying the huge Edits log to Namenode would be time consuming during startup of Namenode? (We had Secondary NameNode in hadoop v1 to solve this problem)
Answers to your queries:
1) While using Hadoop V2, do we use Secondary NameNode in production environment?
Secondary name node is not required in production environment if you deploy StandByName node for High Availability of Name node.
2) How does the edits log gets applied to fsimage in absence of secondary node?
To answer this query, you have to understand how high availability has been implemented in Hadoop in two different ways. : High Availability with QJM and High Availability with NFS Federation
But in these two approaches, QJM (Quorum Journal Manager) is preferred.
In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs).
When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is reads these edits from the JNs and apply to its own name space.
In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.
It is vital for an HA cluster that only one of the NameNodes be Active at a time. ZooKeeper has been used to avoid split brain scenario so that name node state is not getting diverged due to failover.
I have explained failover process for Name node in detailed way at my other StackOverFlow question : How does Hadoop Namenode failover process works?