Search code examples
scalaapache-sparkserializationencoderkryo

Kryo vs Encoder vs Java Serialization in Spark?


Which serialization is used for which case,
From spark documentation it says :
It provides two serialization libraries:
1. Java(default) and
2. Kryo
Now where did Encoders come from and why is it not given in the doc.
And also from databricks it says Encoders performs faster for Datasets,what about RDD, and how do all these maps together. In which case which serializer should we use?


Solution

    • Encoders are used in Dataset only.
    • Kryo is used internally in spark.
    • Kryo and Java serialization is available for you to use for your data shuffling.

    As to which should you use - Kryo is your best option if you don't use Dataset. Otherwise you don't have any options, actually.