While searching best Serialization
techniques for apache-spark
I found below link
https://github.com/scala/pickling#scalapickling
which states Serialization
in scala
will be more faster and automatic with this framework.
And as Scala Pickling
has following advantages. (Ref - https://github.com/scala/pickling#what-makes-it-different)
So, I wanted to know whether this Scala Pickling
(PickleSerializer
) can be used in apache-spark
instead of KryoSerializer
.
Thanks in advance. And forgive me if I am wrong.
Note : I am using scala
language to code apache-spark
(Version. 1.4.1) application.
I visited Databricks for a couple of months in 2014 to try and incorporate a PicklingSerializer
into Spark somehow, but couldn't find a way to include type information needed by scala/pickling into Spark without changing interfaces in Spark. At the time, it was a no-go to change interfaces in Spark. E.g., RDDs would need to include Pickler[T]
type information into its interface in order for the generation mechanism in scala/pickling to kick in.
All of that changed though with Spark 2.0.0. If you use Dataset
s or DataFrame
s, you get so-called Encoder
s. This is even more specialized than scala/pickling.
Use Dataset
s in Spark 2.x. It's much more performant on the serialization front than plain RDDs