Search code examples
dataframescaladictionaryapache-spark

Transform Map[String,Any] to a dataframe in Scala


I have the following object of type Map[String,Any]:

d: Map[String,Any] = Map(count -> 1, results -> List(Map(A -> 1, C -> Hello, B -> Map(BA -> 0, BC -> 0)), Map(A -> 2, C -> Hi, B -> Map(BA -> 0, BC -> 0))), parameters -> Map(P1 -> 805, P2 -> 20230101))

I dont need most of these informations. I only need to extract results map list into a dataframe (ignoring the B object as it is a map). So the desired output would be:

+---------+----------------+
| A       |C               |
+---------+----------------+
|  1      |Hello           |
|  2      |Hi              |
+---------+----------------+

I tried:

val df = d
  .map( m => (m.get("A"),m.get("C")))
  .toDF("A", "C")

But I got

error: value get is not a member of Any

Solution

  • Try casting .asInstanceOf[Seq[Map[String, Any]]], .asInstanceOf[(Int, String)]

    import org.apache.spark.sql.SparkSession
    
    val d: Map[String,Any] = Map(
      "count" -> 1, 
      "results" -> List(
        Map("A" -> 1, "C" -> "Hello", "B" -> Map("BA" -> 0, "BC" -> 0)), 
        Map("A" -> 2, "C" -> "Hi", "B" -> Map("BA" -> 0, "BC" -> 0))
      ), 
      "parameters" -> Map("P1" -> 805, "P2" -> 20230101)
    )
    
    val spark = SparkSession.builder
      .master("local")
      .appName("Spark app")
      .getOrCreate()
    
    import spark.implicits._
    
    val df = d("results")
      .asInstanceOf[Seq[Map[String, Any]]]
      .map(m => 
        (m("A"), m("C")).asInstanceOf[(Int, String)]
      )
      .toDF("A", "C")
    
    df.show()
    
    //+---+-----+
    //|  A|    C|
    //+---+-----+
    //|  1|Hello|
    //|  2|   Hi|
    //+---+-----+