Spark Scala convert RDD with Case Class to simple RDD

This is fine:

case class trans(atm : String, num: Int)
    
val array = Array((20254552,"ATM",-5100), (20174649,"ATM",5120))
val rdd = sc.parallelize(array)
val rdd1 = rdd.map(x => (x._1, trans(x._2, x._3)))

How to convert back to a simple RDD like rdd again?

E.g. rdd: org.apache.spark.rdd.RDD[(Int, String, Int)]

I can do this, for sure:

val rdd2 = rdd1.mapValues(v => (v.atm, v.num)).map(x => (x._1, x._2._1, x._2._2))

but what if there is a big record for the class? E.g. dynamically.

Solution

Not sure exactly how generic you want to go, but in your example of an RDD[(Int, trans)] you can make use of the unapply method of the trans companion object in order to flatten your case class to a tuple.

So, if you have your setup:

case class trans(atm : String, num: Int)

val array = Array((20254552,"ATM",-5100), (20174649,"ATM",5120))
val rdd = sc.parallelize(array)
val rdd1 = rdd.map(x => (x._1, trans(x._2, x._3)))

You can do the following:

import shapeless.syntax.std.tuple._

val output = rdd1.map{
  case (myInt, myTrans) => {
    myInt +: trans.unapply(myTrans).get
  }
}
output
res15: org.apache.spark.rdd.RDD[(Int, String, Int)]

We're importing shapeless.syntax.std.tuple._ in order to be able to make a tuple from our Int + flattened tuple (the myInt +: trans.unapply(myTrans).get operation).