I have used sc.broadcast
for lookup files to improve the performance.
I also came to know there is a function called broadcast
in Spark SQL Functions.
What is the difference between two?
Which one i should use it for broadcasting the reference/look up tables?
If you want to achieve broadcast join in Spark SQL you should use broadcast
function (combined with desired spark.sql.autoBroadcastJoinThreshold
configuration). It will:
SparkContext.broadcast
is used to handle local objects and is applicable for use with Spark DataFrames
.