yarn logs - stdout and stderr became huge files - how to avoid that

Dears friends and colleges

we have ambari cluster with hadoop version - 2.6.4 cluster include 52 datanode machines , and the follwing issue is happened on 9 datanodes machines

so I will explain the problem:

We noticed about critical problem regarding the yarn logs

We saw that stderr and stdout are huge files In our case sdb is the relevant disk and sdb size is only 20G So in fact stderr and stdout are 7G each file

So /grid/sdb became full

My question is – is it possible to limit this files?

[root@datanode04 container_e41_1549894743658_0020_02_000002]# df -h /grid/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         20G   20G  712K 100% /grid/sdb
[root@datanode04 container_e41_1549894743658_0020_02_000002]# pwd
/grid/sdb/hadoop/yarn/log/application_1549894743658_0020/container_e41_1549894743658_0020_02_000002
[root@datanode04 container_e41_1549894743658_0020_02_000002]# du -sh *
6.9G    stderr
6.9G    stdout

Solution

This is the common scenario of getting large log files in hadoop cluster due to log accumulation as multiple services are running in hadoop cluster. If you are running with Ambari managed hadoop cluster you need to configure log4j.properties from Ambari. You can configure this for services running in you hadoop cluster. This will ensure log rotation and retention in your hadoop cluster.

Here is link for reference from hortonwork (HDP) where one can find information about configuring log4j properties of different services running in hadoop cluster. Hope this will be helpful.