I am trying to understand Namenode and I referred to online material and referring to book Hadoop: The definitive guide as well.
I understand that Namenode has concept like : "edit logs", "fsimage", and I can see the following files in my Namenode.
========================================================================
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 23 22:53 edits_0000000000000000001-0000000000000000001
-rw-r--r-- 1 root root 1048576 Nov 23 23:42 edits_0000000000000000002-0000000000000000002
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 00:07 edits_0000000000000000003-0000000000000000003
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 21:03 edits_0000000000000000004-0000000000000000004
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 22:59 edits_0000000000000000005-0000000000000000005
-rw-r--r-- 1 root root 1048576 Nov 24 23:00 edits_0000000000000000006-0000000000000000006
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 25 21:15 edits_0000000000000000007-0000000000000000007
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 25 21:34 edits_0000000000000000008-0000000000000000008
-rw-r--r-- 1 root root 1048576 Nov 26 02:13 edits_inprogress_0000000000000000009
-rw-rw-r-- 1 vevaan24 vevaan24 355 Nov 25 21:15 fsimage_0000000000000000006
-rw-rw-r-- 1 vevaan24 vevaan24 62 Nov 25 21:15 fsimage_0000000000000000006.md5
-rw-r--r-- 1 root root 355 Nov 26 00:12 fsimage_0000000000000000008
-rw-r--r-- 1 root root 62 Nov 26 00:12 fsimage_0000000000000000008.md5
-rw-r--r-- 1 root root 2 Nov 26 00:12 seen_txid
-rw-rw-r-- 1 vevaan24 vevaan24 201 Nov 26 00:12 VERSION
In that book it was mentioned that fsimage
doesn't store the block locations in it.
I have following questions:
1) Does edit logs
store the block locations as well? (for the new transactions?)
2) When Namenode and Datanode are restarted how does Namenode get the block address? My doubt is NN read fsimage
to reconstuct the filesystem info, but fsimage
doesn't have the info of block location, so how this information is reconstructed?
3) Is it true that fsimage
stores BLOCK ID only, and if so, is BLOCK ID unique across Datanodes? Is BLOCK ID same as that of BLOCK address ?
Block locations i.e., the datanodes on which the blocks are stored is neither persisted in the fsimage
file nor in the edit log
. Namenode keeps this mapping only in the memory.
It is the responsibility of each datanode to hold the information of the list of blocks it is storing.
During restart, Namenode loads the fsimage
file into memory and apply the edits from the edit log
, the missing information of block locations is obtained from the datanodes as they check in with their block lists. Namenode, with the information from block lists, constructs the mapping of blocks with their locations in its memory.
fsimage
has more than the Block ID. It holds the information like blocks of the file, block size, replication factor, access time, modification time, file permissions but not the location of the blocks.
Yes, Block IDs are unique. Block address would refer the address of the datanodes in which the block resides.