I have a jsonb column that came from database and i want to split this Column and create separate columns for the Values,Field,Timestamp. My current dataframe is :
Id | Fields" |
---|---|
1009331137 | {"101011182899": {"Value": "12.3", "Timestamp": "2021-01-08"}, "101011402899": {"Value": "34.2", "Timestamp": "2021-01-08"}} |
This is the desired output :
Id | Field | Value | timestamp |
---|---|---|---|
1009331137 | 101011182899 | 12.3 | 2021-01-08 |
1009331137 | 101011402899 | 34.2 | 2021-01-08 |
We can slightly change the structure of the dictionary to get the output we want. So let's convert nested dictionary to flattened dictionary.
df["Fields"] = df["Fields"].apply(lambda x: [{"Field":k,**v} for k,v in x.items()])
New format:
id Fields
0 1009331137 [{'Field': '101011182899', 'Value': '12.3', 'Timestamp': '2021-01-08'}, {'Field': '101011402899', 'Value': '34.2', 'Timestamp': '2021-01-08'}]
Now we can use explode()
and json_normalize()
df = df.explode("Fields").reset_index(drop=True)
df = df.join(pd.json_normalize(df.pop("Fields")))
Out:
id Field Value Timestamp
0 1009331137 101011182899 12.3 2021-01-08
1 1009331137 101011402899 34.2 2021-01-08