Search code examples
hadoopblockhdfs

How does hadoop stores blocks?


I am confuse by the following case: A file (size < block size,replica = 2) is stored in hdfs, use "hadoop fsck + fileName" to count the number of block, because the replica = 2 ,then "Total blocks" should be 2. However, The result of "hadoop fsck" is 1, the output of "hadoop fsck" like this :

Total blocks (validated): 1 (avg. block size 514399 B)

What's wrong? How does hadoop store the file?


Solution

  • Your assumption is wrong. Hadoop counts the number of blocks without replication. You can check this when brwosing your hadoop file system. if you choose a file, you can see an output like:

    Total number of blocks: 1
    471365007463424017:     IP1:Port        IP2:Port        IP3:Port
    

    this is one block which is situated on 3 different machines (for a replication factor of 3).