Search code examples
hadoophdfshdp

hadoop + Blocks with corrupt replicas


we have HDP cluster version 2.6.4 with ambari platform

from ambari dashboard we can see Blocks with corrupt replicas with 1

enter image description here

and also from

$ hdfs dfsadmin -report
Configured Capacity: 57734285504512 (52.51 TB)
Present Capacity: 55002945909856 (50.02 TB)
DFS Remaining: 29594344477833 (26.92 TB)
DFS Used: 25408601432023 (23.11 TB)
DFS Used%: 46.19%
Under replicated blocks: 0
Blocks with corrupt replicas: 1    <-----------------
Missing blocks: 0
Missing blocks (with replication factor 1): 0

in order to find the corrupted file we do the following

$ hdfs fsck -list-corruptfileblocks
Connecting to namenode via http://master.sys76.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

but as we can see above we not found the file

also we did the following in order to delete the corrupted file

 hdfs fsck / -delete

but still Blocks with corrupt replicas still remain with 1

any suggestions?


Solution

  • A file signalled as having "blocks with corrupted replicas" in not a corrupt file neither means it have a "corrupt block" or has lost any data.

    A "block with corruputed replicas" is a block where at least one replica is corrupt BUT can still be recovered by the remaining (majority of the) replicas which means the content can be recovered from those replicas.

    Also the fsck command will not tell you nothing about files with blocks in that state, because it only check if files and blocks in the filesystem are OK, and since those can be auto-fixed by HDFS they will not be reported.

    The only command that will report those files is the hdfs dfsadmin -report command and this is used by Ambari to raise the warn.

    As far as I know, the only way to recover from these warnings is wait for HDFS to auto-fix them, by replacing the corrupted replicas with god ones from the other datanodes with good replicas.