Search code examples
scalaserializationapache-sparkimplicit

implicit val serialization when using global object in spark-shell


It's not clear to me why the (non-serializable) implicit val gets serialized (exception thrown) here:

implicit val sc2:SparkContext = sc
val s1 = "asdf"
sc.parallelize(Array(1,2,3)).map(x1 => s1.map(x => 4))

but not when s1's value is in the scope of the closure:

implicit val sc2:SparkContext = sc
sc.parallelize(Array(1,2,3)).map(x1 => "asdf".map(x => 4))

My use case is obviously more complicated but I've boiled it down to this issue.

(The solution is to define the implicit val as @transient)


Solution

  • The scope is spark-shell REPL. In this case, sc2 (and any other implicit vals defined in the top-level REPL scope) is only serlalized when it's implicit AND another val from that scope used in the RDD operation. This makes sense because implicit values need to be made available globally and hence are automatically serialized to all worker nodes.