I'm trying to run a spark application written in scala 11.8, spark 2.1 on an EMR cluster version 5.3.0. I configured the cluster with the following json:
"Classification": "hadoop-env",
"Configurations": [
"Classification": "export",
"Configurations": [],
"Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
"Properties": {}
"Classification": "spark-env",
"Configurations": [
"Classification": "export",
"Configurations": [],
"Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
"Properties": {}
if i'm trying to run on a client mode everything run just fine. when trying to run the application with cluster mode it failed with status code 12.
Here is part of the master log where I see the status code:
17/02/01 10:08:26 INFO TaskSetManager: Finished task 79.0 in stage 0.0 (TID 79) in 293 ms on ip-10-234-174-231.us-west-2.compute.internal (executor 2) (78/11102) 17/02/01 10:08:27 INFO YarnAllocator: Driver requested a total number of 19290 executor(s). 17/02/01 10:08:27 INFO ApplicationMaster: Final app status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from Reporter thread.) 17/02/01 10:08:27 INFO SparkContext: Invoking stop() from shutdown hook
As part of the job I need to read some data from s3,
something like this:
sc.textFile( "s3n://stambucket/impressions/*/2017-01-0[1-9]/*/impression_recdate*)
If I only take one day, there are no errors.
But with 9 I get this 12 exit code. It's even weirder consider the fact that 9 days running on client mode just fine.
Exit code 12 is a standard exit code in linux to signal out of memory.
Spark set the default amount of memory to use per executor process to be 1gb. EMR won't override this value regardless the amount of memory available on the cluster's nodes/master. One possible fix is to set the maximizeResourceAllocation flag to true.