Search code examples
scalaapache-sparkkryo

How to register kryo classes in the spark-shell


The SparkConf has the method registerKryoClasses:

def registerKryoClasses(classes: Array[Class[_]]): SparkConf = { .. }

However it is not available/exposed in the RuntimeConfiguration facade provided by the SparkSession.conf() attribute

@transient lazy val conf: RuntimeConfig = new RuntimeConfig(sessionState.conf)

Here is more about the RuntimeConfiguration:

/**
 * Runtime configuration interface for Spark. To access this, use `SparkSession.conf`.
 *
 * Options set here are automatically propagated to the Hadoop configuration during I/O.
 *
 * @since 2.0.0
 */
@InterfaceStability.Stable
class RuntimeConfig private[sql](sqlConf: SQLConf = new SQLConf) {

There is a clear workaround for this when creating our own SparkSession: we can invoke the set(key,value) on the SparkConf that is provided to the

val mysparkConf = SparkConf.set(someKey,someVal)
mysparkConf.registerKryoClasses(Array(classOf[Array[InternalRow]]))
SparkSession.builder.conf(mySparkConf)

And then one that is not so clear..

conf.registerKryoClasses(Array(classOf[scala.reflect.ClassTag$$anon$1]))

But when running the Spark shell the sparkSession/sparkContext are already created. So then how can the non-runtime settings be put into effect?

The particular need here is :

sparkConf.registerKryoClasses(Array(classOf[org.apache.spark.sql.Row]))

When attempting to set that on the SqlConf available to the spark session object We get this exception:

scala>   spark.conf.registerKryoClasses(Array(classOf[Row]))

error: value registerKryoClasses is not a member of org.apache.spark.sql.RuntimeConfig spark.conf.registerKryoClasses(Array(classOf[Row]))

So then how can the kryo serializers be registered in the spark-shell ?


Solution

  • The following is not an exact answer to [my own] question - but it seems to serve as a workaround for the present specific predicament:

    implicit val generalRowEncoder: Encoder[Row] = org.apache.spark.sql.Encoders.kryo[Row]
    

    Having this implicit in scope seems to register the classes with kryo directly on the SparkConf.