Search code examples
scalaapache-sparkserializationkryo

Spark Internal class Kryo registration


I'm new to Spark I'm using 2.4.4 with kryo. The spark job will write around 100 part files and then fails by throwing the following exception

Caused by: java.lang.IllegalArgumentException: Class is not registered: 
org.apache.spark.sql.execution.datasources.WriteTaskResult
Note: To register this class use: 
kryo.register(org.apache.spark.sql.execution.datasources.WriteTaskResult.class);

As suggested in the exception I can register

kryo.register(org.apache.spark.sql.execution.datasources.WriteTaskResult.class);

But the problem is, it is an internal Spark class, the question I have is, is it ok to register this internal class? shouldn't it be taken care of by Kryo or Spark itself as long as it is an internal class? What is the right way to fix this problem?

Thanks, Raj


Solution

  • I registered the following classes and it worked

    kryo.register(classOf[org.apache.spark.sql.execution.datasources.WriteTaskResult])
    kryo.register(classOf[org.apache.spark.sql.execution.datasources.ExecutedWriteSummary])
    kryo.register(classOf[org.apache.spark.sql.execution.datasources.BasicWriteTaskStats])
    kryo.register(classOf[org.apache.spark.internal.io.FileCommitProtocol])
    kryo.register(classOf[org.apache.spark.sql.catalyst.expressions.UnsafeRow])
    

    Thanks Raj