Let's assume I have data which is structured like such:
{
"_id" : 245,
"connId" : "3r34b32",
"roomList" : [
{
"reportId" : 29,
"siteId" : 1
}]
}
How do I go about getting the keys from the dictionary within the list, along with the value in order for it to have this expected output below.
Expected Output:
ID,connID,reportId,siteId
245,3r34b32,29,1
I'm using json_normalize
but this does not work with lists, so I need to add an extra bit of code which can unpack the list that contains a dictionary.
Any help will be appreciated.
You have a nested record. You can handle them separately with record_path
and them concatenate them with pd.concat()
root = pd.json_normalize(d).drop('roomList',1)
nested = pd.json_normalize(d, record_path='roomList')
output = pd.concat([root, nested],axis=1)
print(output)
_id connId reportId siteId
0 245 3r34b32 29 1
If you don't want to specify a column name explicitly you can do the following. This checks which of the columns after normalization are nested, separately normalizes them and concatenates all of them. I have added another nested JSON for sake of example -
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
_id connId reportId siteId reportId2 siteId2
0 245 3r34b32 29 1 39 5
NOTE: If some of your nested records have same names, you many want to use some prefix while normalizing them. Check this for more details.