I have below output from my query.
id | info |
---|---|
1 | {"t":{"t_id":"5678","timestamp":1675664495,"tc_id":"12345","properties":[{"k":"psct","value":"IN"},{"k":"ptt","value":"SC"},{"k":"pcd","value":"IN, 2023-02-03"}]}},{"t":{"t_id":"1234","properties":[{"k":"psct","value":"IN"},{"k":"pctt","value":"SC"},{"k":"pcd","value":"IN, 2023-02-04"}],"timestamp":1675749213,"tc_id":"67890"}} |
I want to get the data for k
and value
in different columns
My output should be like this:
id | t_id | k | value |
---|---|---|---|
1 | 5678 | psct | IN |
1 | 5678 | ptt | SC |
1 | 5678 | pcd | IN, 2023-02-03 |
1 | 1234 | psct | IN |
1 | 1234 | ptt | SC |
1 | 1234 | pcd | IN, 2023-02-04 |
Not able to extract the values.
Can I get some assistance on this please?
These needs to be written in Spark DataFrame.
Thanks for you help @Abdennacer Lachiheb.
I managed to get my question answered myself. Below is the solution.
val test = spark.sql(s"""select
t_id,
info
from kh.details""")
.withColumn("info_new", explode(col("info")))
.withColumn("properties", get_json_object($"info_new", "$.t.properties"))
.withColumn("props", from_json($"properties",ArrayType(StringType)))
.withColumn("props_info",explode($"props"))
.withColumn("key",get_json_object($"props_info", "$.key"))
.withColumn("value",get_json_object($"props_info", "$.value"))