I have spark streaming (2.1) job running in cluster mode and keep running into an issue where the job gets killed (by resource manager) after few weeks because the yarn container logs are causing the disk to get filled. Is there a way to avoid this?
I currently set the below two settings for log size. However this is not helping with the above situation.
spark.executor.logs.rolling.maxRetainedFiles 2 spark.executor.logs.rolling.maxSize 107374182
Thanks!
The best approach is create new log4j properties for spark streaming jobs and instead of console Appender use File appender to roll-up the file size, number of files. You can create /etc/spark/conf/spark-stream-log4j.properties as like following
log4j.rootCategory=INFO, filerolling
log4j.appender.filerolling=org.apache.log4j.filerollingFileAppender
log4j.appender.filerolling.layout=org.apache.log4j.PatternLayout
log4j.appender.filerolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.filerolling.maxFileSize=3MB
log4j.appender.filerolling.maxBackupIndex=15
log4j.appender.filerolling.file=/var/log/hadoop-yarn/containers/spark.log
log4j.appender.filerolling.encoding=UTF-8
## To minimize the logs
log4j.logger.org.apache.spark=ERROR
log4j.logger.com.datastax=ERROR
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.hive=ERROR
log4j.logger.org.apache.hadoop.hive=ERROR
log4j.logger.org.spark_project.jetty.server.HttpChannel=ERROR
log4j.logger.org.spark_project.jetty.servlet.ServletHandler=ERROR
log4j.org.apache.kafka=INFO
Spark submit command like
spark-submit --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-stream-log4j.properties -XX:+UseConcMarkSweepGC -XX:OnOutOfMemoryError='kill -9 %p'" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-stream-log4j.properties -XX:+UseConcMarkSweepGC -XX:OnOutOfMemoryError='kill -9 %p'" --files /etc/spark/conf/spark-stream-log4j.properties