Search code examples
arraysdataframescalaapache-sparkapache-spark-sql

Extracting value from a nested array in Spark


I have below output from my query.

id info
1 {"t":{"t_id":"5678","timestamp":1675664495,"tc_id":"12345","properties":[{"k":"psct","value":"IN"},{"k":"ptt","value":"SC"},{"k":"pcd","value":"IN, 2023-02-03"}]}},{"t":{"t_id":"1234","properties":[{"k":"psct","value":"IN"},{"k":"pctt","value":"SC"},{"k":"pcd","value":"IN, 2023-02-04"}],"timestamp":1675749213,"tc_id":"67890"}}

I want to get the data for k and value in different columns

My output should be like this:

id t_id k value
1 5678 psct IN
1 5678 ptt SC
1 5678 pcd IN, 2023-02-03
1 1234 psct IN
1 1234 ptt SC
1 1234 pcd IN, 2023-02-04

Not able to extract the values.

Can I get some assistance on this please?

These needs to be written in Spark DataFrame.


Solution

  • Thanks for you help @Abdennacer Lachiheb.

    I managed to get my question answered myself. Below is the solution.

     val test = spark.sql(s"""select 
                                t_id,
                                info
                        from kh.details""")
                        .withColumn("info_new", explode(col("info")))
                        .withColumn("properties", get_json_object($"info_new", "$.t.properties"))
                        .withColumn("props", from_json($"properties",ArrayType(StringType)))
                        .withColumn("props_info",explode($"props"))
                        .withColumn("key",get_json_object($"props_info", "$.key"))
                        .withColumn("value",get_json_object($"props_info", "$.value"))