Search code examples
apache-sparkapache-spark-sqlhiveql

Spark SQL Configurations


What type of parameters we can set using SPARK SQL? My assumption is Spark accepts the parameters which are prefixed with spark.sql, and ignores any other parameters which do not start with spark.sql and others can be only be added during Spark session creation.

Let's say, spark.sql.autoBroadcastJoinThreshold, spark.sql.broadcastTimeout, etc. are accepted and spark.maxRemoteBlockSizeFetchToMem, spark.driver.memory, etc. are ignored. Let me know if my understanding is not correct.


Solution

  • Spark SQL has both static and runtime configurations. One can consult the online docs to see whether a particular config has a context, session or a query scope.

    Runtime SQL configurations are per-session, mutable Spark SQL configurations. They can be set with initial values by the config file and command-line options with --conf/-c prefixed, or by setting SparkConf that are used to create SparkSession. Also, they can be set and queried by SET commands and rest to their initial values by RESET command, or by SparkSession.conf’s setter and getter methods in runtime.
    :
    :
    Static SQL configurations are cross-session, immutable Spark SQL configurations. They can be set with final values by the config file and command-line options with --conf/-c prefixed, or by setting SparkConf that are used to create SparkSession. External users can query the static sql config values via SparkSession.conf or via set command, e.g. SET spark.sql.extensions;, but cannot set/unset them.