Search code examples
pythonjsonpivotnormalize

Issue with JSON nested values through Python json_normalize


Despite looking through numerous examples+solutions I haven't found anything that works on this particular JSON schema.

I have this line of JSON coming from an API:

x = {'rows': [{'columns':
                 [{'name': 'User ID', 'value': '0000123'},
                 {'name': 'Last Name', 'value': 'Test1'}]},
              {'columns':
                 [{'name': 'User ID', 'value': '0000567'},
                 {'name': 'Last Name', 'value': 'Test2'}]}]}

And for some reason I can't get it to output with various forms of json_normalize, the closest I can get is for them to stack.

pd.json_normalize(data=x,record_path=['rows',['columns']])

With result:

        name    value
0    User ID  0000123
1  Last Name    Test1
2    User ID  0000567
3  Last Name    Test2

But I'm looking for it to output:

User ID  Last Name
0000123  Test1
0000567  Test2

I'm looking into pivot, but there's no usable index. Could I append the array value? Alternatively, there may just be a better way to create the dataframe, so it can normalize easily.

Help appreciated.


Solution

  • Someone may have a better answer, but if you check the source of json_normalize (link) it is "Disastrously inefficient for now" and just uses list comprehension and loops. So you likely won't lose much efficiency by normalizing your data manually. Here's an example written for clarity (i.e., this won't scale great if you're retrieving large jsons) that loops over your data to build the fields:

    normalized_data={'User ID':[],'Last Name':[]}
    for row in x['rows']:
        for column in row['columns']:
            normalized_data[column['name']].append(column['value'])
    
    df=pd.DataFrame(normalized_data)
    print(df)
    

    prints:

       User ID Last Name
    0  0000123     Test1
    1  0000567     Test2
    

    Obviously not ideal and would love to see other answers, but given that jsons come with open ended formatting it'll be hard to find a pandas method that works out of the box if you're not lucky enough to have your json structure match those that pandas can handle.