We are using cloudera CDH 5.3. I am facing a problem wherein the size of "/dfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr" and "dncp-vlock-verification.log.prev" keeps increasing to TBs within hours. I read in some of the blogs and they mention it is an HDFS bug. A temporary solution to this problem is to stop the datanode services and delete these files. But we have observed that the log file increases in size on either of the datanodes (even on the same node after deleting it). Thus, it requires continuous monitoring.
Does anyone have a permanent solution to this problem?
One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS DataNode configuration the key dfs.datanode.scan.period.hours
to 0
(default is 504
in hours). The negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.
Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, which includes the HDFS-7430 rewrite changes and associated bug fixes. These changes have done away with the use of such a local file, thereby removing the problem.