Search code examples
scalaapache-sparkrdd

convert RDD Array[Any] = Array(List([String], ListBuffer([string])) to RDD(String, Seq[String])


I have a RDD with Any type, example:

Array(List(Mathematical Sciences, ListBuffer(applications, asymptotic, largest, enable, stochastic)))

I want to convert it to RDD of type RDD[(String, Seq[String])]

I tried:

val rdd = sc.makeRDD(strList)
case class X(titleId: String, terms: List[String])

val df = rdd.map { case Array(s0, s1) => X(s0, s1) }.toDF()

I passed a long time to try without success


Solution

  • You can use:

    val result: RDD[(String, Seq[String])] = 
      rdd.map { case List(s0: String, s1: ListBuffer[String]) =>  (s0, s1) }
    

    But note that any record in the input RDD[Any] that doesn't match these types (that can't be checked in compile time) would throw a scala.MatchError.