Search code examples
apache-sparkpysparkapache-spark-sqlspark-streaming

Spark dynamic allocation configuration settings precedence


I have a spark job that runs on a cluster with dynamic resource allocation enabled.. I submit the spark job with num executors and the executor memory properties.. what will take precedence here? Will the job run with dynamic allocation or with the resources that I mention in the config?


Solution

  • It depends on which config parameter has a greater value ...

    spark.dynamicAllocation.initialExecutors or spark.executor.instances aka --num-executors (when launching via terminal at runtime)

    Here is the reference doc if you are using Cloudera on YARN and make sure you are looking at the correct CDH version according to your environment.

    https://www.cloudera.com/documentation/enterprise/6/6.2/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_dynamic_allocation__table_tkb_nyv_yr

    Apache YARN documentation too:

    https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

    So to sum it up if you are using --num-executors it is most likely overriding (cancelling and not using) dynamic allocation unless you set spark.dynamicAllocation.initialExecutors to be a higher value.