Search code examples
scaladataframeapache-sparkapache-spark-sqlrdd

How to convert rdd object to dataframe in Scala


I read data from ElasticSearch and save into an RDD.

val es_rdd = sc.esRDD("indexname/typename",query="?q=*")

The rdd has the next example data:

(uniqueId,Map(field -> value))
(uniqueId2,Map(field2 -> value2))

How can I convert this RDD (String, Map to a Dataframe (String, String, String)?


Solution

  • You can use explode to achieve it.

      import spark.implicits._
      import org.apache.spark.sql.functions._
    
      val rdd = sc.range(1, 10).map(s => (s, Map(s -> s)))
      val ds = spark.createDataset(rdd)
      val df = ds.toDF()
      df.printSchema()
      df.show()
    
      df.select('_1,explode('_2)).show()
    

    output:

    root
     |-- _1: long (nullable = false)
     |-- _2: map (nullable = true)
     |    |-- key: long
     |    |-- value: long (valueContainsNull = false)
    
    +---+--------+
    | _1|      _2|
    +---+--------+
    |  1|[1 -> 1]|
    |  2|[2 -> 2]|
    |  3|[3 -> 3]|
    |  4|[4 -> 4]|
    |  5|[5 -> 5]|
    |  6|[6 -> 6]|
    |  7|[7 -> 7]|
    |  8|[8 -> 8]|
    |  9|[9 -> 9]|
    +---+--------+
    
    +---+---+-----+
    | _1|key|value|
    +---+---+-----+
    |  1|  1|    1|
    |  2|  2|    2|
    |  3|  3|    3|
    |  4|  4|    4|
    |  5|  5|    5|
    |  6|  6|    6|
    |  7|  7|    7|
    |  8|  8|    8|
    |  9|  9|    9|
    +---+---+-----+