What is the difference between Spark-submit "--master" defined in the CLI and spark application code, defining the master?
In Spark we can specify the master URI in either the application code like below:
Or we can specify the master URI in the spark-submit as an argument to a parameter, like below:
Does one take precendence over the other? Do they have to agree contractually, so I have two instances of the same URI referenced in the program spark-submit and the spark application code, creating the SparkSession? Will one override the other? What will the SparkSession do differently with the master argument, and what will the spark-submit master parameter do differently?
Any help would be greatly appreciated. Thank you!
To quote the official documentation
The spark-submit script can load default Spark configuration values from a properties file and pass them on to your application. By default, it will read options from conf/spark-defaults.conf in the Spark directory. For more detail, see the section on loading default configurations.
Loading default Spark configurations this way can obviate the need for certain flags to spark-submit. For instance, if the spark.master property is set, you can safely omit the --master flag from spark-submit. In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.
If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running spark-submit with the --verbose option.
So all are valid options, and there is a well defined hierarchy which defines precedence if the same option is set in multiple place. From highest to lowest: