Search code examples
scalaapache-spark

Can I use same SparkSession in different threads


In my spark app I use many temp views to read datasets and then use it in huge sql expression, like that:

for (view < cfg.views)
  spark.read.format(view.format).load(view.path).createTempView(view.name)

spark.sql("...")

In case of json that require a lot of time to parse and prepare all views one by one. Can I use same SparkSession in multiple threads?


Solution

  • SparkSession has been threadsafe since 2.0, SparkContext is similarly threadsafe. This doesn't mean there aren't other issues you'll hit with concurrent handling, especially in your own code (e.g. maps, udfs etc.).

    Wrt to SessionCatalog (where the tempview is "stored") it is designed to be thread safe and uses synchronized blocks for tempViews access.

    It's still possible that something in the LogicalPlans isn't threadsafe (whether your code or a bug in Spark), although it aims to be as each execution is isolated.