Search code examples
hdfshadoop2

HDFS block storage


  • 1)I have a hdp cluster with 2 datanodes. But the replication factor for hdfs is 3. Where is the 3rd replica block stored in this case ?
  • 2)If I upload a file in hdfs (with the replication factor as 3 ) ,
    shouldn't the file size increase by 3 times in hdfs (as there are 2
    extra copies)
  • 3)Is there a way , that I could check which block of data resides in which datanode (I understand that metadata info will
    be in name node , but is there a command that provides me that info) ?

Solution

  • 1) Because the NameNode does not allow DataNodes to have multiple replicas of the same block, maximum number of replicas created is the total number of DataNodes at that time.

    Reference: https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model (Under Replica Placement: The First Baby Steps)

    This means if you have 2 datanodes, your replication factor cannot be 3.

    2) Your file size will be the same. However, using three number of replicas will increase your storage overhead. I mean what if your file is 2 GB. You are keeping your file using with 3 replicas. This means you are allocating 6 GB (2 GB is for your original file and 2+2 GBs are for copies) storage space for your file.