I have set up a testing cluster consisting of 1 m4large driver and 3 m4large nodes. Without adding any extra configuration arguments to spark-submit
I wanted to test this cluster configuration. However, when I check in the Spark UI I can see that my Spark job only uses 2 executors and I also notice in Ganglia that one node is barely doing anything (like it's not used at all).
What can I do to make sure that all nodes are getting tasks to complete?
spark-submit
doesn't use the whole cluster unless you specify the number of executors, executor-core
and executor-memory
. By default it uses the configuration specified in the spark default configuration. You can see the default config in the spark-defaults.conf
file inside the spark installation directory.
Now by default spark-submit uses 2 executors 512MB memory in the executor. So if you want the whole cluster please use spark-submit command with specifying executor-core and executor-memory.
You can find the examples here