Search code examples
hadoopapache-sparkhdfshadoop-yarnspark-graphx

Cloudera Manager - HDFS Free Space Health Issues Troubleshoot


I have a cluster configured of two hosts -

Hosts configurations :

It seems the jobs I am running are creating huge logs and one of my hdfs datanode shows critical health issue as -

Critical health issue for one of the hdfs data node -

Four Things :

  1. How I can cleanup these logs and make the space free? Is deleting them manually from /var/log/hadoop-hdfs, a good idea?

hadoop-hdfs status

  1. As Above /var/log/hadoop-hdfs directory is only 610 MB where does the space in hdfs getting occupied?

  2. How I can configure the log files to get deleted periodically?

  3. I have HDFS, Spark and YARN - MR2 services up and running they all are creating their own logs. I wish to clean up those as well.

Thanks!


Solution

  • After digging more into hdfs -

    To get which directory is having what size execute : hadoop fs -du -h /user/

    Spark service log creation:

    Logs created by Spark location identified, deleted manually and cluster is back in its healthy state -

    Spark on creates logs in HDFS at location -

    /user/spark/applicationHistory
    

    The log files size was 129GB.(deleted)

    commands used - (As -rm moves files to Trash, we need to -rm it from Trash also in order clean up properly)

    $ hadoop fs -rm /user/spark/applicationHistory/*
    
    $ hadoop fs -rm -r  /user/cloudera/.Trash/Current