Search code examples
scalaapache-sparkapache-spark-sqlrdd

how to extract values in array of array strings in RDD


val rdd :Array[Array[String]] = Array(Array("2345","345","fghj","dfhg")
                                ,Array("2345","3450","fghj","dfhg")
                                ,Array("23145","1345","fghj","dffghg")
                                ,Array("23045","345","feghj","adfhg"))

this is my input. I need to extract first two elements of each array in the form of key value pair.

I would like to get output

(2345,345)
(2345,3450)
(23145,1345)
(23045,345)

Solution

  • You can simply do

    rdd.map(array => (array(0), array(1)))
    //res0: Array[(String, String)] = Array((2345,345), (2345,3450), (23145,1345), (23045,345))
    

    If you want the output in Map then you can add .toMap function call

    rdd.map(array => (array(0), array(1))).toMap
    //res0: scala.collection.immutable.Map[String,String] = Map(2345 -> 3450, 23145 -> 1345, 23045 -> 345)