Search code examples
hadoophadoop2namenode

How does Namenode reconstruct the full block information after restart?


I am trying to understand Namenode and I referred to online material and referring to book Hadoop: The definitive guide as well.

I understand that Namenode has concept like : "edit logs", "fsimage", and I can see the following files in my Namenode.

========================================================================

-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 23 22:53 edits_0000000000000000001-0000000000000000001
-rw-r--r-- 1 root     root     1048576 Nov 23 23:42 edits_0000000000000000002-0000000000000000002
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 00:07 edits_0000000000000000003-0000000000000000003
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 21:03 edits_0000000000000000004-0000000000000000004
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 22:59 edits_0000000000000000005-0000000000000000005
-rw-r--r-- 1 root     root     1048576 Nov 24 23:00 edits_0000000000000000006-0000000000000000006
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 25 21:15 edits_0000000000000000007-0000000000000000007
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 25 21:34 edits_0000000000000000008-0000000000000000008
-rw-r--r-- 1 root     root     1048576 Nov 26 02:13 edits_inprogress_0000000000000000009
-rw-rw-r-- 1 vevaan24 vevaan24     355 Nov 25 21:15 fsimage_0000000000000000006
-rw-rw-r-- 1 vevaan24 vevaan24      62 Nov 25 21:15 fsimage_0000000000000000006.md5
-rw-r--r-- 1 root     root         355 Nov 26 00:12 fsimage_0000000000000000008
-rw-r--r-- 1 root     root          62 Nov 26 00:12 fsimage_0000000000000000008.md5
-rw-r--r-- 1 root     root           2 Nov 26 00:12 seen_txid
-rw-rw-r-- 1 vevaan24 vevaan24     201 Nov 26 00:12 VERSION

In that book it was mentioned that fsimage doesn't store the block locations in it.

I have following questions:

1) Does edit logs store the block locations as well? (for the new transactions?)

2) When Namenode and Datanode are restarted how does Namenode get the block address? My doubt is NN read fsimage to reconstuct the filesystem info, but fsimage doesn't have the info of block location, so how this information is reconstructed?

3) Is it true that fsimage stores BLOCK ID only, and if so, is BLOCK ID unique across Datanodes? Is BLOCK ID same as that of BLOCK address ?


Solution

  • Block locations i.e., the datanodes on which the blocks are stored is neither persisted in the fsimage file nor in the edit log. Namenode keeps this mapping only in the memory.

    It is the responsibility of each datanode to hold the information of the list of blocks it is storing.

    During restart, Namenode loads the fsimage file into memory and apply the edits from the edit log, the missing information of block locations is obtained from the datanodes as they check in with their block lists. Namenode, with the information from block lists, constructs the mapping of blocks with their locations in its memory.

    fsimage has more than the Block ID. It holds the information like blocks of the file, block size, replication factor, access time, modification time, file permissions but not the location of the blocks.

    Yes, Block IDs are unique. Block address would refer the address of the datanodes in which the block resides.