Why is the number of spark executors reduced using custom settings on EMR

I'm running spark 1.6, cluster mode on EMR 4.3.0 with the following settings:

 [
  {
    "classification": "spark-defaults",
    "properties": {
      "spark.executor.cores" : "16"
    }
  },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation": "true"
    }
  }
]

With the following instances:

master: 1 * m3.xlarge
core: 2 * m3.xlarge

When I test the number of executors with:

val numExecutors = sc.getExecutorStorageStatus.size - 1

I only get 2.

Are somehow the EMR settings for spark overwritten?

Solution

Ok, here is the problem : you are settings the number of cores for each executor and not the number of executors. e.g "spark.executor.cores" : "16".

And since you are on AWS EMR, this means also that you are using YARN.

By default, the number of executor instances is 2 (spark.executor.instances is the property that defines the number of executors).

Note :

This property is incompatible with spark.dynamicAllocation.enabled. If both spark.dynamicAllocation.enabled and spark.executor.instances are specified, dynamic allocation is turned off and the specified number of spark.executor.instances is used.
Fewer cores means more executors in general, but in this case you'll have to manage the numbers of cores with yarn since YARN will manage the cluster for you and since by default YARN is using 1 core per executor.

Thus you get the following :

scala> val numExecutors = sc.getExecutorStorageStatus.size - 1
res1 : numberExectuors : Int = 2

This means that you are actually using two executors, one per slave that is only operating on 1 core.