I am using Apache Spark for a project. I have a DataFrame
. I have been able to convert it into an RDD
. I need to convert it into a 2d array. Below is the code that I have written. What should I do next?
val x: List[List[String]] = df.select(columnNames(0), (columnNames.drop(1): _*)).rdd.collect()
Here df is DataFrame
.
After discussing your problem in the chat, here is the solution :
val x : List[List[String]] = df.select(columnNames.head, columnNames.tail: _*).
rdd.map{ case r : Row =>
Row(r.getAs[Long](0).toString,r.getAs[Long](1).toString,r.getAs[String](2)).toSeq.map(v => v.asInstanceOf[String]).toList
}.collect.toList
Since I don't have a view on the actually data, remember that this is an example, you can get the columns as you wish by it's field name.
example : r.getAs[String]("column1")
Another solution, which I'm not very fan of is :
val x : List[List[String]] = df.select(columnNames.head, columnNames.tail: _*).
rdd.map{ case r : Row =>
r.mkString(",").split(",").toList
}.collect.toList