This is fine:
case class trans(atm : String, num: Int)
val array = Array((20254552,"ATM",-5100), (20174649,"ATM",5120))
val rdd = sc.parallelize(array)
val rdd1 = rdd.map(x => (x._1, trans(x._2, x._3)))
How to convert back to a simple RDD like rdd again?
E.g. rdd: org.apache.spark.rdd.RDD[(Int, String, Int)]
I can do this, for sure:
val rdd2 = rdd1.mapValues(v => (v.atm, v.num)).map(x => (x._1, x._2._1, x._2._2))
but what if there is a big record for the class? E.g. dynamically.
Not sure exactly how generic you want to go, but in your example of an RDD[(Int, trans)]
you can make use of the unapply
method of the trans
companion object in order to flatten your case class to a tuple.
So, if you have your setup:
case class trans(atm : String, num: Int)
val array = Array((20254552,"ATM",-5100), (20174649,"ATM",5120))
val rdd = sc.parallelize(array)
val rdd1 = rdd.map(x => (x._1, trans(x._2, x._3)))
You can do the following:
import shapeless.syntax.std.tuple._
val output = rdd1.map{
case (myInt, myTrans) => {
myInt +: trans.unapply(myTrans).get
}
}
output
res15: org.apache.spark.rdd.RDD[(Int, String, Int)]
We're importing shapeless.syntax.std.tuple._
in order to be able to make a tuple from our Int + flattened tuple (the myInt +: trans.unapply(myTrans).get
operation).