Search code examples
apache-sparkspark-streaming

Spark Broadcast variables life time


I am running a spark streaming job, as part of it I am creating multiple broadcast variables.

So, I have two questions about it. 1. Is there any function which can give a list of all the broadcasted variables like spark.getPersistentRDDs which lists all the persisted RDD. 2. If we do not destroy the Spark broadcast variables, would they be delete by Spark after they have not been used for a certain period of time.


Solution

    1. Spark does not provide function to list all the broadcasted variables like getPersistentRDDs but one alternative solution is, whenever broadcast variable is created, store it in list/queue for future reference.
    2. Spark has ContextCleaner which runs on periodic interval to delete broadcast variable if it is not being used.
      https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L233