I am transferring some code from Databricks notebook into Jupyter notebook locally.
The following code that works in the Databricks Notebook is not working locally.
res = sc.broadcast(spark.read.table(my_table))
Here is my local code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
res = sc.broadcast(spark.read.table(my_table))
With the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-67-dface594b1d3> in <module>
----> 1 ccode_dict = sc.broadcast(spark.read.table(my_table))
AttributeError: 'SparkSession' object has no attribute 'broadcast'
Is there any alternative to sc.broadcast()?
I am using Databricks connect to run my code locally: https://docs.databricks.com/dev-tools/databricks-connect.html
So the main issue was that when creating a SparkSession(), you also create a SparkContext(), which means that if you create a new sparkContext() by doing:
sc = SparkContext()
You will create new a new SparkContext() that conflicts with the one created by the sparkSession(). What you should do is first create your SparkSession and then retrieve the SparkContext from the SparkSession(). Here is the code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate() # Create Spark Session
sc = spark.sparkContext # Retrieve the Spark Context from the Spark Session
# You can now use broadcast from the spark context
res = sc.broadcast(spark.read.table(my_table))