Search code examples
apache-sparkpysparkspark-submit

Setting default.parallelism in spark-submit command


What is the syntax to change the default parallelism when doing a spark-submit job?

I can specify the number of executors, executor cores and executor memory by the following command when submitting my spark job:

spark-submit --num-executors 9 --executor-cores 5 --executor-memory 48g

Specifying the parallelism in the conf file is :

spark.conf.set("spark.default.parallelism",90)

If I were to change it in the spark-submit command, would it be ?:

spark-submit --default.parallelism 90

Solution

  • According to the Spark Documentation on Launching Application with spark-submit the spark-submit command has the following syntax:

    ./bin/spark-submit \
      --class <main-class> \
      --master <master-url> \
      --deploy-mode <deploy-mode> \
      --conf <key>=<value> \
      ... # other options
      <application-jar> \
      [application-arguments]
    

    In your case you need to add the following if you want to change the mentioned configuration.

    spark-submit [...] --conf spark.default.parallelism=90