When I launch my spark streaming job on EMR (cluster mode), I can see stdout from my job for the first few moments then it disappears...
I can see the few log lines at the following location in S3 (I setup EMR to copy logs to my s3 bucket): s3-us-west-1.amazonaws.com//spark/logs/j-IEMN2TMESREK/containers/application_1454718762107_0001/container_1454718762107_0001_01_000001/stdout.gz
After ~10 seconds of streaming job running, no more stdout is being delivered to the logs.
Is EMR redirecting stdout somewhere else?
Turned out my executors were not getting log4j configuration.
I used bootstrap step to place log4j.properties in /tmp/log4jproperties
.
Then using spark-submit
with the following arguments
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=/tmp/log4j.properties
--files file:///tmp/log4j.properties