Search code examples
hadoopnamespaceshdfshadoop-yarnfsck

HDFS fsck command output


I got this in output so I just want to know what is BP, Blk? Can you explain me what each thing means in this output? I know the

 BP-929597290-192.0.0.2-1439573305237:blk_1074084574_344316 len=2 repl=3 [DatanodeInfoWithStorage[192.0.0.9:1000,DS-730a75d3-046c-4254-990a-4eee9520424f,DISK], DatanodeInfoWithStorage[192.0.0.1:1000,DS-fc6ee5c7-e76b-4faa-b663-58a60240de4c,DISK], DatanodeInfoWithStorage[192.0.0.3:1000,DS-8ab81b26-309e-42d6-ae14-26eb88387cad,DISK]]

I guess 192.0.0.9:1000 this is the Ip of first replication of data


Solution

    1. BP-929597290-192.0.0.2-1439573305237

      This is Block Pool ID. Block pool is a set of blocks that belong to single name space. For simplicity, you can say that all the blocks managed by a Name Node are under the same Block Pool.

      The Block Pool is formed as:

      String bpid = "BP-" + rand + "-"+ ip + "-" + Time.now();        
      
      Where: 
      rand = Some random number
      ip = IP address of the Name Node
      Time.now() - Current system time
      

      Read about Block Pools here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html

    2. blk_1074084574_344316:

      Block number of the block. Each block in HDFS is given a unique identifier.

      The block ID is formed as:

      blk_<blockid>_<genstamp> 
      
      Where: 
      blockid = ID of the block
      genstamp = an incrementing integer that records the version of a particular block
      

      Read about generation stamp here: http://blog.cloudera.com/blog/2009/07/file-appends-in-hdfs/

    3. len=2

      Length of the block: Number of bytes in the block

    4. repl=3

      There are 3 replicas of this block

    5. DatanodeInfoWithStorage[192.0.0.9:1000,DS-730a75d3-046c-4254-990a-4eee9520424f,DISK

      Where:

      192.0.0.9 => IP address of the Data Node holding this block
      1000 => Data streaming port
      DS-730a75d3-046c-4254-990a-4eee9520424f => Storage ID. It is an internal ID of the Data Node. It is assigned, when the Data Node registers with Name Node
      DISK => storageType. It is DISK here. Storage type can be: RAM_DISK, SSD, DISK and ARCHIVE
      

    The description of point 5 applies to remaining 2 blocks:

    DatanodeInfoWithStorage[192.0.0.1:1000,DS-fc6ee5c7-e76b-4faa-b663-58a60240de4c,DISK], 
    DatanodeInfoWithStorage[192.0.0.3:1000,DS-8ab81b26-309e-42d6-ae14-26eb88387cad,DISK]]