Search code examples
hadoopblockhdfs

How to select policy of block placement in the DataNode?


If the block replication is 3 in my hadoop cluster,and every DataNode has 3 ${dfs.data.dir} directories. When the DataNode is choosed to storage block, the block is storage in all 3 direcoties or one of them?

If the answer is latter, how to choose a ${dfs.data.dir} directory?


Solution

  • The right directory is chosen on round robin manner when the block arrives to the datanode. You can alter this behavior by changing dfs.datanode.fsdataset.volume.choosing.policy to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy, then the right directory would be chosen based on the space available in them (refer to configurations here: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml)