How does checkpointing work in HDFS? I would like to get clarity on fs.checkpoint.period and fs.checkpoint.size

When it says that , The secondary namenode checkpoints every hour (fs.checkpoint.period in seconds) or sooner if the edit log has reached 64 MB (fs.checkpoint.size in bytes) ? What does exactly in mean?

As per my understanding , edit logs gets stored in local file disk.

Solution

HDFS metadata can be thought of consisting of two parts: the base filesystem table (stored in a file called fsimage) and the edit log which lists changes made to the base table (stored in a file called edits). Checkpointing is a process of reconciling fsimage with edits to produce a new version of fsimage. There are two benefits arising out of this: a more recent version of fsimage, and a truncated edit log.

fs.checkpoint.period controls how often this reconciliation will be triggered. 3600 means that every hour fsimage will be updated and edit log truncated. Checkpiont is not cheap, so there is a balance between running it too often and letting the edit log grow too large. This parameter should be set to get a good balance assuming typical filesystem use in your cluster.

fs.checkpoint.size is a size threshold, which, if reached by edits, will trigger an immediate checkpoint regardless of time elapsed since the last checkpoint. This is insurance from edit log getting too large under unusually heavy write traffic to the filesystem metadata.