What I am missing about kryo serialization?
Class1 and Class3 are not java serializable classes (no default constructors, neither getters and setters)
When I try to "use" an instance, that was created out of Spark context, inside Spark, I get a serialization issue, whether I register Classe3 as a Kryo class or not.
Works fine:
Dataset<Class1> ds = spark.createDataset(classes, Encoders.kryo(Class1.class));
Dataset<String> df = df.map((MapFunction<Class1, String>) class1 -> class1.getName(), Encoders.STRING());
df.show();
Serialization error caused by Class3
spark = SparkSession
.builder()
.master("local[*]")
.config(new SparkConf().registerKryoClasses(new Class[] {Class3.class}))
.appName("spark_test")
.getOrCreate();
Class3 class3 = Class3.getInstance();
Dataset<Class1> ds = spark.createDataset(classes, Encoders.kryo(Class1.class));
Dataset<String> df = df.map((MapFunction<Class1, String>) class1 -> class1.getName() + "-" class3.getId(), Encoders.STRING());
df.show();
Summarizing the discussion that happened in the comments to form an answer -
When you are trying to invoke a transformation, Spark driver will have to create and ship a closure for the code within that transformation to the executor(s) which is responsible for running it. In your case the line of code Class3 class3 = Class3.getInstance();
, is part of the Scala Object that enclose the creation and usage of Spark context to arrive at some result, a driver application. Hence when you try to pass class3
in the map transformation, driver is trying to serialize the enclosing Scala object. This scala object is not Serializable by itself unless you implement serializable hence you are getting Serialization issue.
Re:Kryo Serialization - Because you have registered your Class3 with Kryo, it will help you serialize the Class3 instance, however it won't serialize the Composite object which has Class3 instance as a variable.
Hence if you extract the value of class3.getId()
and then pass it to your map transformation, you do not need Class3 to be registered with Kryo.
In your example enclosing Scala object I mentioned above is same as Driver application.
Hope this helps.