I have a cluster configured of two hosts -
It seems the jobs I am running are creating huge logs and one of my hdfs datanode shows critical health issue as -
Four Things :
As Above /var/log/hadoop-hdfs directory is only 610 MB where does the space in hdfs getting occupied?
How I can configure the log files to get deleted periodically?
I have HDFS, Spark and YARN - MR2 services up and running they all are creating their own logs. I wish to clean up those as well.
Thanks!
After digging more into hdfs -
To get which directory is having what size execute :
hadoop fs -du -h /user/
Spark service log creation:
Logs created by Spark location identified, deleted manually and cluster is back in its healthy state -
Spark on creates logs in HDFS at location -
/user/spark/applicationHistory
The log files size was 129GB.(deleted)
commands used - (As -rm moves files to Trash, we need to -rm it from Trash also in order clean up properly)
$ hadoop fs -rm /user/spark/applicationHistory/*
$ hadoop fs -rm -r /user/cloudera/.Trash/Current