Is it necessary to broadcast an object member in Spark?

Say I have an object and I need to make some operations towards the member of this object: arr.

object A {
  val arr = (0 to 1000000).toList
  def main(args: Array[String]): Unit = {
    //...init spark context
    val rdd: RDD[Int] = ...
    rdd.map(arr.contains(_)).saveAsTextFile...
  }
}

What is the difference between broadcasted arr and not broadcasted? i.e.

val arrBr = sc.broadcast(arr)
rdd.map(arrBr.value.contains(_))

and

rdd.map(arr.contains(_))

In my opinion, the object A is a singleton object, so it will be transferred through the nodes in Spark.

Is it necessary to use broadcast in this scenario?

Solution

In the case

rdd.map(arr.contains(_))

arr is serialized shipped for each task

while in

val arrBr = sc.broadcast(arr)
rdd.map(arrBr.value.contains(_))

this is only done once per executor.

Therefore you should use broadcast when dealing with large datastructures.