Search code examples
apache-spark-sqlunionrddlsh

How can I union all the DataFrame in RDD[DataFrame] to a DataFrame without for loop using scala in spark?


val result is a spark DataFram and its column is [uid: Int, vector: Vector]. But the type of recomRes is RDD[DataFrame], how can I map union all the result in recomRes to a DataFrame?

val recomRes = result.rdd.map(row => {
    val uid = row.apply(0)
    val vec = row.getAs[Vector](1)
    brp
       .approxNearestNeighbors(vectors, vec, 5)
       .withColumn("uid", lit(uid))
       .select("uid", "aid", "distCol")
}

I have tried for loop to deal with, but very very slow.


Solution

  • Use toDF() method after map.

    You need to import sqlContext.implicits._