Despite looking through numerous examples+solutions I haven't found anything that works on this particular JSON schema.
I have this line of JSON coming from an API:
x = {'rows': [{'columns':
[{'name': 'User ID', 'value': '0000123'},
{'name': 'Last Name', 'value': 'Test1'}]},
{'columns':
[{'name': 'User ID', 'value': '0000567'},
{'name': 'Last Name', 'value': 'Test2'}]}]}
And for some reason I can't get it to output with various forms of json_normalize, the closest I can get is for them to stack.
pd.json_normalize(data=x,record_path=['rows',['columns']])
With result:
name value
0 User ID 0000123
1 Last Name Test1
2 User ID 0000567
3 Last Name Test2
But I'm looking for it to output:
User ID Last Name
0000123 Test1
0000567 Test2
I'm looking into pivot, but there's no usable index. Could I append the array value? Alternatively, there may just be a better way to create the dataframe, so it can normalize easily.
Help appreciated.
Someone may have a better answer, but if you check the source of json_normalize
(link) it is "Disastrously inefficient for now" and just uses list comprehension and loops. So you likely won't lose much efficiency by normalizing your data manually. Here's an example written for clarity (i.e., this won't scale great if you're retrieving large jsons) that loops over your data to build the fields:
normalized_data={'User ID':[],'Last Name':[]}
for row in x['rows']:
for column in row['columns']:
normalized_data[column['name']].append(column['value'])
df=pd.DataFrame(normalized_data)
print(df)
prints:
User ID Last Name
0 0000123 Test1
1 0000567 Test2
Obviously not ideal and would love to see other answers, but given that jsons come with open ended formatting it'll be hard to find a pandas method that works out of the box if you're not lucky enough to have your json structure match those that pandas can handle.