Search code examples
serializationcassandraapache-sparkkryo

Serializing Cassandra Tables with Kryo and Spark


I'm trying to test Kryo serialization with Apache Spark in order to measure execution times with and without serialization and save kryo object stream to disk to simulate cache under spark.

The test I've designed is storing a Cassandra table in a serialized CassandraRDD object.

The Scala Code that generate the CassandraRDD follows:

import com.datastax.spark.connector._
import org.apache.spark.{SparkConf, SparkContext}

object SparkCassandra {
def main(args: Array[String]): Unit ={


val conf = new SparkConf(true).set("spark.cassandra.connection.host","mycassandraip")
conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer")
val sc = new SparkContext("local","test",conf)

//Access to cassandra table
val kvRDD = sc.cassandraTable("test","kv")


kvRDD.collect().foreach(println)

}

}

This code works but I suspect that kvRDD, that is a CassandraRDD object is not being seriallized.

Is it any rule abou what can and cannot be Seriallized with Kryo? How would I register this class with kryo.register?

If I try to register with kryo.register(ClassOf[CassandraRDD]) I get the following error when trying to execute it:

Error:(11, 27) class CassandraRDD takes type parameters
    kryo.register(classOf[CassandraRDD])
                      ^

Please note I'm very new to Scala and Kryo.

Thank you so much in advance


Solution

  • Please try this for CassandraRDD serialization

     kryo.register(classOf[CassandraRDD[Any]])