Search code examples
apache-sparkspark-streaming

Spark streaming job log size overflow


I have spark streaming (2.1) job running in cluster mode and keep running into an issue where the job gets killed (by resource manager) after few weeks because the yarn container logs are causing the disk to get filled. Is there a way to avoid this?

I currently set the below two settings for log size. However this is not helping with the above situation.

spark.executor.logs.rolling.maxRetainedFiles 2 spark.executor.logs.rolling.maxSize 107374182

Thanks!


Solution

  • The best approach is create new log4j properties for spark streaming jobs and instead of console Appender use File appender to roll-up the file size, number of files. You can create /etc/spark/conf/spark-stream-log4j.properties as like following

    log4j.rootCategory=INFO, filerolling
    
    log4j.appender.filerolling=org.apache.log4j.filerollingFileAppender
    log4j.appender.filerolling.layout=org.apache.log4j.PatternLayout
    log4j.appender.filerolling.layout.conversionPattern=[%d] %p %m (%c)%n
    log4j.appender.filerolling.maxFileSize=3MB
    log4j.appender.filerolling.maxBackupIndex=15
    log4j.appender.filerolling.file=/var/log/hadoop-yarn/containers/spark.log
    
    log4j.appender.filerolling.encoding=UTF-8
    
    ## To minimize the logs
    log4j.logger.org.apache.spark=ERROR
    log4j.logger.com.datastax=ERROR
    log4j.logger.org.apache.hadoop=ERROR
    log4j.logger.hive=ERROR
    log4j.logger.org.apache.hadoop.hive=ERROR
    log4j.logger.org.spark_project.jetty.server.HttpChannel=ERROR
    log4j.logger.org.spark_project.jetty.servlet.ServletHandler=ERROR
    log4j.org.apache.kafka=INFO
    

    Spark submit command like

    spark-submit  --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-stream-log4j.properties  -XX:+UseConcMarkSweepGC -XX:OnOutOfMemoryError='kill -9 %p'"   --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-stream-log4j.properties -XX:+UseConcMarkSweepGC  -XX:OnOutOfMemoryError='kill -9 %p'"  --files /etc/spark/conf/spark-stream-log4j.properties