Search code examples
apache-sparkemr

EMR Spark job using less executors than nodes in the cluster


I have set up a testing cluster consisting of 1 m4large driver and 3 m4large nodes. Without adding any extra configuration arguments to spark-submit I wanted to test this cluster configuration. However, when I check in the Spark UI I can see that my Spark job only uses 2 executors and I also notice in Ganglia that one node is barely doing anything (like it's not used at all).

What can I do to make sure that all nodes are getting tasks to complete?


Solution

  • spark-submit doesn't use the whole cluster unless you specify the number of executors, executor-core and executor-memory. By default it uses the configuration specified in the spark default configuration. You can see the default config in the spark-defaults.conf file inside the spark installation directory.

    Now by default spark-submit uses 2 executors 512MB memory in the executor. So if you want the whole cluster please use spark-submit command with specifying executor-core and executor-memory.

    You can find the examples here