Search code examples
apache-sparkserializationkryo

What are the pros and cons of java serialization vs kryo serialization?


In spark, java serialization is the default, if kryo is that efficient then why it is not set as default. Is there some cons using kryo or in what scenarios we should use kryo or java serialization?


Solution

  • Here is comment from documentation:

    Kryo is significantly faster and more compact than Java serialization (often as much as 10x), but does not support all Serializable types and requires you to register the classes you’ll use in the program in advance for best performance.

    So it is not used by default because:

    1. Not every java.io.Serializable is supported out of the box - if you have custom class that extends Serializable it still cannot be serialized with Kryo, unless registered.
    2. One needs to register custom classes.

    Note according to documentation:

    Spark automatically includes Kryo serializers for the many commonly-used core Scala classes covered in the AllScalaRegistrar from the Twitter chill library.