Search code examples
scalaapache-sparkconfigurationcluster-computingspark-submit

What is the difference between defining Spark Master in the CLI vs defining 'master' in the Spark application code?


What is the difference between Spark-submit "--master" defined in the CLI and spark application code, defining the master?

In Spark we can specify the master URI in either the application code like below:

Spark Master configured in application code

Or we can specify the master URI in the spark-submit as an argument to a parameter, like below:

Spark-submit master option

Does one take precendence over the other? Do they have to agree contractually, so I have two instances of the same URI referenced in the program spark-submit and the spark application code, creating the SparkSession? Will one override the other? What will the SparkSession do differently with the master argument, and what will the spark-submit master parameter do differently?

Any help would be greatly appreciated. Thank you!


Solution

  • To quote the official documentation

    The spark-submit script can load default Spark configuration values from a properties file and pass them on to your application. By default, it will read options from conf/spark-defaults.conf in the Spark directory. For more detail, see the section on loading default configurations.

    Loading default Spark configurations this way can obviate the need for certain flags to spark-submit. For instance, if the spark.master property is set, you can safely omit the --master flag from spark-submit. In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.

    If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running spark-submit with the --verbose option.

    So all are valid options, and there is a well defined hierarchy which defines precedence if the same option is set in multiple place. From highest to lowest:

    • Explicit settings in the application.
    • Commandline arguments.
    • Options from the configuration files.