Search code examples
hdfshadoop-yarnhadoop2

When is data deleted from data nodes in case of hdfs dfs -rmr on a folder?


We know that as we run the rmr command, edit log is created. Do the data nodes wait for updates to FSImage before purging the data or that too happens concurrently? Is there any pre-condition around acknowledgement of transaction from Journal nodes? Just trying to understand how HDFS edits work wherein you could have massive change in disk size.. How long will it take before 'hdfs dfs -du -s -h /folder' and 'hdfs dfsadmin -report' reflect the decrease in size? We tried deleting 2TB of data and after 1 hour, the data nodes local folder (/data/yarn/datanode) still was not reduced by 2TB.


Solution

  • After deleting the data from HDFS hadoop keeps that data in trash folder and you need to run below command to free the disk space

    Hadoop fs -expunge
    

    Then the space will be released by the HDFS.

    Or you can run below command while deleting the data to skip trash

    Hadoop fs -rmr -skipTrash /folder
    

    It will not move the data into trash.

    Note: A file remains in /trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the file from the HDFS namespace.