I am confuse by the following case: A file (size < block size,replica = 2) is stored in hdfs, use "hadoop fsck + fileName" to count the number of block, because the replica = 2 ,then "Total blocks" should be 2. However, The result of "hadoop fsck" is 1, the output of "hadoop fsck" like this :
Total blocks (validated): 1 (avg. block size 514399 B)
What's wrong? How does hadoop store the file?
Your assumption is wrong. Hadoop counts the number of blocks without replication. You can check this when brwosing your hadoop file system. if you choose a file, you can see an output like:
Total number of blocks: 1
471365007463424017: IP1:Port IP2:Port IP3:Port
this is one block which is situated on 3 different machines (for a replication factor of 3).