I can flatMap the 2nd element of the RDD, fine.
val rdd = sc.parallelize( Seq( (1, "Hello how are you"),
(1, "I am fine"),
(2, "Yes you are")
)
)
val rdd2 = rdd.flatMap(x => x._2.split(" "))
However, I would like to append x._1 to each split item of x._2 immediately to form a tuple (String, Int). For some reason I cannot see it - and I do not want to convert to a DF array and explode. Any ideas?
Just iterate over the array (split result) and append the value you need:
val rdd = sc.parallelize( Seq( (1, "Hello how are you"),
(1, "I am fine"),
(2, "Yes you are")
)
)
val rdd2 = rdd.flatMap(x => x._2.split(" ").map(item => s"${item}+${x._1}"))