Search code examples
javascalaapache-sparkserializationkryo

Registering complex scala classes with Kryo in spark-shell and Scala jars


I have a new spark 2.3.1 application... it ran fine for a while, but now it's broken as data volume has increased.

The orginal error is a kryo serialization problem... com.esotericsoftware.kryo.KryoException: java.lang.NegativeArraySizeException at failure. The weirdest part is that it's not consistent... if I run identical code, on identical data, on my non-shared cluster, it may or may not fail and seems totally random.

I've increased spark.kryoserializer.buffer.max up to 2047m (the max) from 256m (my default) just to see what happens, and it sill fails with the same error. I've also tried increasing the parallelism in RDD's that fail (6x per executor from 3x), and no luck.

Now I'm trying to run code snippets in spark-shell --conf spark.kryo.registrationRequired=true to find all the classes I need to register to shrink the size when serializing, and then incrementally adding them to the --conf 'spark.kryo.classesToRegister=org.myOrg.MyClass1,org.myOrg.MyClass2' and will later move them into the jar (conf.registerKryoClasses(Array(classOf[MyClass1], classOf[MyClass2]))) after I've found them all (there's way more than I expected).

There's one that I absolutely cannot figure out how to register though. The error looks like this...

Caused by: java.lang.IllegalArgumentException: Class is not registered: org.myOrg.MyClass[]
Note: To register this class use: kryo.register(org.myOrg.MyClass[].class);

I suspect it's an argument Iterable[MyClass] for some other class like class MyOuterClass(val mcs: Iterable[MyClass]) but everything I try to register fails to work. I believe MyClass[] is a java.lang.Array[MyClass] but I've tried registering every combination of Array, Iterable, [], etc. that I can think of, and no luck getting it registered.

Any advice for the syntax to register Iterable, List, TupleN in both the command line launch of spark-shell and ultimately in the code? Ultimately I'll have some very nested tuples as well, but I haven't gotten that far yet.

The closest result I can find in stackoverflow is here, but I can't get make this work for me either. Require kryo serialization in Spark (Scala)

Thanks in advance.

EDIT

Just to clarify... after successfully registering MyClass I still get an error message Class is not registered: MyClass[] and I can't figure out what the [] are at the end or how to register to make those go away.


Solution

  • If you class name is MyClass then try registering with [LMyClass;

    conf.registerKryoClasses(Array( Class.forName("[LMyClass;")))

    it should load and register array class for MyClass