Search code examples
scalaapache-sparkapache-spark-sqlrdd

How to convert an RDD into a 2d array in Scala?


I am using Apache Spark for a project. I have a DataFrame. I have been able to convert it into an RDD. I need to convert it into a 2d array. Below is the code that I have written. What should I do next?

val x: List[List[String]] = df.select(columnNames(0), (columnNames.drop(1): _*)).rdd.collect()

Here df is DataFrame.


Solution

  • After discussing your problem in the chat, here is the solution :

    val x : List[List[String]] = df.select(columnNames.head, columnNames.tail: _*).
                                 rdd.map{ case r : Row => 
                                   Row(r.getAs[Long](0).toString,r.getAs[Long](1).toString,r.getAs[String](2)).toSeq.map(v => v.asInstanceOf[String]).toList
                                 }.collect.toList
    

    Since I don't have a view on the actually data, remember that this is an example, you can get the columns as you wish by it's field name. example : r.getAs[String]("column1")

    Another solution, which I'm not very fan of is :

    val x : List[List[String]] = df.select(columnNames.head, columnNames.tail: _*).
                                 rdd.map{ case r : Row => 
                                   r.mkString(",").split(",").toList
                                 }.collect.toList