Search code examples
apache-sparkspark-streamingemr

EMR Spark Streaming Job Stdout logging disappears


When I launch my spark streaming job on EMR (cluster mode), I can see stdout from my job for the first few moments then it disappears...

I can see the few log lines at the following location in S3 (I setup EMR to copy logs to my s3 bucket): s3-us-west-1.amazonaws.com//spark/logs/j-IEMN2TMESREK/containers/application_1454718762107_0001/container_1454718762107_0001_01_000001/stdout.gz

After ~10 seconds of streaming job running, no more stdout is being delivered to the logs.

Is EMR redirecting stdout somewhere else?


Solution

  • Turned out my executors were not getting log4j configuration.

    I used bootstrap step to place log4j.properties in /tmp/log4jproperties.

    Then using spark-submit with the following arguments

    --conf spark.executor.extraJavaOptions=-Dlog4j.configuration=/tmp/log4j.properties --files file:///tmp/log4j.properties