Search code examples
javahadoophdfshadoop-yarnhadoop2

Block pool in hadoop


I was going through Hadoop tutorials, i got below doubt regarding Block pool in hadoop.

Block pool- Basically each block pools are managed independently from one another, each one is a set of blocks that belong to a single namespace.

Is that block pool is virtual concept or is it something like metadat on blocks which is maintained in memory ?


Solution

  • It is metadata about each block of data.

    The files in hadoop are divied into blocks and then these blocks are stored on different datanodes. But to access this data again, we need to know where these blocks are stored. The namenode does this thing with the help of block pools.

    Blockpools are thus metadata about each block of each file on hadoop cluster. They are stored in memory of the namenode and not on disk. So if namenode shuts down, this information needs to be reconstructed.

    Now in Hadoop Federation, we have the concept of multiple namespaces. Different namenodes are responsible for different namespaces. Suppose we have two machines acting as namenodes as:

    1. Fist namenode (NN1) handles all files under namespace /finance, ie all data of finance department.
    2. Similarly second namenode(NN2) handles data of accounts department under namespace /accounts.

    Now to manage the blocks of files under /finance namespace, only NN1 is needed. Thus only NN1 need to have the blockpools of /finance namespace. Similarly to know about files under /accounts we need only NN2. Only NN2 will have block pool of /accounts namespace. Thus they act independently.