Search code examples
apache-sparkrdd

Split String of RDD and combine with other RDD element in one statement


I can flatMap the 2nd element of the RDD, fine.

val rdd = sc.parallelize( Seq( (1, "Hello how are you"),
                               (1, "I am fine"),
                               (2, "Yes you are")
                             )
                        )
val rdd2 = rdd.flatMap(x => x._2.split(" "))

However, I would like to append x._1 to each split item of x._2 immediately to form a tuple (String, Int). For some reason I cannot see it - and I do not want to convert to a DF array and explode. Any ideas?


Solution

  • Just iterate over the array (split result) and append the value you need:

    val rdd = sc.parallelize( Seq( (1, "Hello how are you"),
                                   (1, "I am fine"),
                                   (2, "Yes you are")
                                 )
                            )
    val rdd2 = rdd.flatMap(x => x._2.split(" ").map(item => s"${item}+${x._1}"))