I'm running spark 1.6, cluster mode on EMR 4.3.0 with the following settings:
[
{
"classification": "spark-defaults",
"properties": {
"spark.executor.cores" : "16"
}
},
{
"classification": "spark",
"properties": {
"maximizeResourceAllocation": "true"
}
}
]
With the following instances:
master: 1 * m3.xlarge
core: 2 * m3.xlarge
When I test the number of executors with:
val numExecutors = sc.getExecutorStorageStatus.size - 1
I only get 2.
Are somehow the EMR settings for spark overwritten?
Ok, here is the problem : you are settings the number of cores for each executor and not the number of executors. e.g "spark.executor.cores" : "16"
.
And since you are on AWS EMR, this means also that you are using YARN
.
By default, the number of executor instances is 2 (spark.executor.instances
is the property that defines the number of executors).
Note :
spark.dynamicAllocation.enabled
. If both spark.dynamicAllocation.enabled
and spark.executor.instances
are specified, dynamic allocation is turned off and the specified number of spark.executor.instances
is used.Thus you get the following :
scala> val numExecutors = sc.getExecutorStorageStatus.size - 1
res1 : numberExectuors : Int = 2
This means that you are actually using two executors, one per slave that is only operating on 1 core.