I have a Java class which has several String
fields and one HashMap
field. I am serializing objects of this class with default Kryo serialization and storing them in HBase.
After reading them in memory, deserialization in a flatMap
function of RDD in Spark produces the following error. Though same code segment works if it's not involving Spark.
16/06/22 11:13:05 WARN TaskSetManager: Lost task 20.0 in stage 3.0 (TID 85, localhost): com.esotericsoftware.kryo.KryoException: Unable to find class: Dadaisme
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:126)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at prSpark.EmPageRank$1.call(EmPageRank.java:227)
at prSpark.EmPageRank$1.call(EmPageRank.java:1)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:149)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:149)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Dadaisme
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:340)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
... 22 more
Here from stack trace com.esotericsoftware.kryo.KryoException: Unable to find class: Dadaisme
it says "Dadisme" class is not found but "Dadisme" is not any class in my program, it is data in HashMap field.
This exception occurred because of version difference in Kyro libraries used for serialization and deserialization. Spark uses version 2 of Kryo by default and I used latest version (i.e. 3.x) of Kryo for serialization of objects. So serialization and deserialization versions should match.