I have not specified any spark properties in my application and going with default values. How do we find the specific value of a specific spark properties being used?
In below case, why we are not able to find value of spark.executor.cores
and running into error?
On running the below, getting the error message - org.apache.spark.SparkNoSuchElementException: [SQL_CONF_NOT_FOUND] The SQL config "spark.executor.cores" cannot be found. Please verify that the config exists.
print(spark.conf.get("spark.executor.cores"))
print(spark.conf.get("spark.sql.shuffle.partitions"))
print(spark.sparkContext.getConf().getAll())
You're encountering 2 different types of configuration parameters that are confusing you here:
spark.sparkContext.getConf().get("my-conf"))
. They are generic to your application, whether or not you're using Spark SQL.spark.sql.
. These are relevant when you're using Spark SQL (when using DataFrames for example). You can access these by using spark.conf.get("my-sql-conf")
.Now we know this, we understand why
Let's answer the 3 following questions:
Why does print(spark.conf.get("spark.executor.cores"))
return nothing?
spark.executor.cores
is not a SQL configuration parameter so is not accessible with spark.conf.get()
.
Why does print(spark.conf.get("spark.sql.shuffle.partitions"))
return 200?
Since this config param starts with spark.sql
, it IS a SQL config parameter. So it's accessible with spark.conf.get()
.
But why don't you find spark.executor.cores
in the output of the following command?
print(spark.sparkContext.getConf().getAll())
The reason why is that you probably started your SparkContext by explicitly setting that configuration parameter. In that case, they don't show up in this list that is the output of .getAll()
. Of course, they do have values but the value will just be the default value.
For the following code, I will be using spark.sparkContext.getConf().get()
instead of .getAll()
but they essentially do the same. .get()
just gives the value of a specific config param.
If I start up a pyspark shell without explicitly setting that configuration parameter, I get the following:
pyspark
# Wait until the pyspark REPL is up and running
>>> print(spark.sparkContext.getConf().get("spark.executor.cores"))
None
If I start up a pyspark shell and explicitly set the spark.executor.cores
value, I get the following:
pyspark --conf spark.executor.cores=3
# Wait until the pyspark REPL is up and running
>>> print(spark.sparkContext.getConf().get("spark.executor.cores"))
3
There are 2 things to remember here:
spark.sparkContext.getConf().get()
does not always return a value for a configuration parameter. For some configuration parameters, it only returns a value if you explicitly set that value.