Search code examples
hadoophdfsdatanode

If the number of hadoop data node folders is different, can block replication problems occur?


Suppose you have 20 nodes in cluster, 15 nodes have 10 child folders, like /data01, ... /data10 and other 5 nodes have 6 child folders. The number of data node folder is unbalanced.
In this case, is the possibility of creating a problem in the block replication higher? (ex. ReplicationNotFoundException, BlockMissingException)

If so, what can be done other than adding disks in this case? Thank you.


Solution

  • Block replication considers rack awareness while copying data and replicates according to the replication factor set (at overall HDFS level or each file level, etc.)

    If disks are removed without proper procedure (after data upload to the HDFS), block missing exception might occur, but replication will be done automatically.