Search code examples
javascalacluster-computingapache-sparkhadoop-yarn

How to set amount of Spark executors?


How could I configure from Java (or Scala) code amount of executors having SparkConfig and SparkContext? I see constantly 2 executors. Looks like spark.default.parallelism does not work and is about something different.

I just need to set amount of executors to be equal to cluster size but there are always only 2 of them. I know my cluster size. I run on YARN if this matters.


Solution

  • OK, got it. Number of executors is not actually Spark property itself but rather driver used to place job on YARN. So as I'm using SparkSubmit class as driver and it has appropriate --num-executors parameter which is exactly what I need.

    UPDATE:

    For some jobs I don't follow SparkSubmit approach anymore. I cannot do it primarily for applications where Spark job is only one of application component (and is even optional). For these cases I use spark-defaults.conf attached to cluster configuration and spark.executor.instances property inside it. This approach is much more universal allowing me to balance resources properly depending on cluster (developer workstation, staging, production).