I am new to Spark. I am trying to flatten a dataframe but failed to do so with "explode".
The original dataframe schema is like below:
ID|ApprovalJSON
1|[{"ApproverType":"1st Line Manager","Status":"Approved"},{"ApproverType":"2nd Line Manager","Status":"Approved"}]
2|[{"ApproverType":"1st Line Manager","Status":"Approved"},{"ApproverType":"2nd Line Manager","Status":"Rejected"}]
I need to convert it to below schema?
ID|ApprovalType|Status
1|1st Line Manager|Approved
1|2nd Line Manager|Approved
2|1st Line Manager|Approved
2|2nd Line Manager|Rejected
I have tried
df_exploded = df.withColumn("ApprovalJSON", explode("ApprovalJSON"))
But I got error:
Cannot resolve "explode(ApprovalJSON)" due to data type mismatch: parameter 1 requires ("ARRAY" or "MAP") type, however, "ApprovalJSON" is of "STRING" type.;
First parse the JSON like strings to array of structs then use inline
to explode array into rows and columns
df1 = df.withColumn("ApprovalJSON", F.from_json("ApprovalJSON", schema="array<struct<ApproverType string, Status string>>"))
df1 = df1.select("ID", F.inline('ApprovalJSON'))
Result
df1.show()
+---+----------------+--------+
| ID| ApproverType| Status|
+---+----------------+--------+
| 1|1st Line Manager|Approved|
| 1|2nd Line Manager|Approved|
| 2|1st Line Manager|Approved|
| 2|2nd Line Manager|Rejected|
+---+----------------+--------+