Dears friends and colleges
we have ambari cluster with hadoop version - 2.6.4 cluster include 52 datanode machines , and the follwing issue is happened on 9 datanodes machines
so I will explain the problem:
We noticed about critical problem regarding the yarn logs
We saw that stderr and stdout are huge files In our case sdb is the relevant disk and sdb size is only 20G So in fact stderr and stdout are 7G each file
So /grid/sdb became full
My question is – is it possible to limit this files?
[root@datanode04 container_e41_1549894743658_0020_02_000002]# df -h /grid/sdb
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 20G 20G 712K 100% /grid/sdb
[root@datanode04 container_e41_1549894743658_0020_02_000002]# pwd
/grid/sdb/hadoop/yarn/log/application_1549894743658_0020/container_e41_1549894743658_0020_02_000002
[root@datanode04 container_e41_1549894743658_0020_02_000002]# du -sh *
6.9G stderr
6.9G stdout
This is the common scenario of getting large log files in hadoop cluster due to log accumulation as multiple services are running in hadoop cluster. If you are running with Ambari managed hadoop cluster you need to configure log4j.properties from Ambari. You can configure this for services running in you hadoop cluster. This will ensure log rotation and retention in your hadoop cluster.
Here is link for reference from hortonwork (HDP) where one can find information about configuring log4j properties of different services running in hadoop cluster. Hope this will be helpful.