Search code examples
jsonpython-3.xpandasdataframejson-normalize

How to read in pandas dataframe complex JSON files containing many down levels


I already browsed Stack Overflow and many sites, but I could not find a solution resolving my problem. I am trying to convert a JSON file received from a multi-dimensionnal inference engine (AI), to an Excel format through a pandas dataframe. This JSON file has a quite complex structure with many levels down (I am not familiar with JSON language).

Here is the JSON format:

    {
"data": {
    "queries": [{
            "id": 292,
            "**name**": "Data_8",
            "queryId": 0,
            "values": {
                "Entreprise": {
                    "F1": {
                        "rule": 1450,
                        "**value**": "1.000000000000"
                    }
                }
            }
        }, {
            "id": 430,
            "name": "Data_9",
            "queryId": 1,
            "values": {
                "Entreprise": {
                    "F1": {
                        "rule": 1437,
                        "value": "N"
                    }
                }
            }
        }, {
            "id": 359,
            "**name**": "Data_10",
            "queryId": 2,
            "values": {
                "Entreprise": {
                    "F1": {
                        "rule": 876,
                        "**value**": "O"
                    }
                }
            }
        }, and so on.

I load this file with:

    with open(output_JSON) as data_file:    
         data2= json.load(data_file)

From the file, I need to retrieve only two fields: name and the correspondant value, in two columns.

Desired output of the dataframe is :

              name            value
   0          Data_8            1
   1          Data_9            N
   2          Data_10           O 

Thank you for your time and your help.

Anyway, take care and stay safe @ home.

Greetings from Paris, France :)

UPDATE (2/05/2020): File is loaded with :

with open(output_JSON) as data_file:    
             data2= json.load(data_file)

then:

df = pd.DataFrame(data2['data'])
print (df)

returns:

                                              queries
0   {'id': 292, 'name': 'Data_8', 'queryId': 0, 'v...
1   {'id': 430, 'name': 'Data_9', 'queryId': 1, 'v...
2   {'id': 359, 'name': 'Data_10', 'queryId': 2, '...

then:

df2=df["queries"]
print (df2)

returns quite the same:

0     {'id': 292, 'name': 'Data_8', 'queryId': 0, 'v...
1     {'id': 430, 'name': 'Data_9', 'queryId': 1, 'v...
2     {'id': 359, 'name': 'Data_10', 'queryId': 2, '...

At this point I dont know how to get name and value columns in a dataframe.

Thank you again for your time :)


Solution

  • Use json_normalize

    • If the pandas version is < 1, use from pandas.io.json import json_normalize

    data2

    • This is the data I have in a file called test.json
    {
        "data": {
            "queries": [{
                    "id": 292,
                    "name": "Data_8",
                    "queryId": 0,
                    "values": {
                        "Entreprise": {
                            "F1": {
                                "rule": 1450,
                                "value": "1.000000000000"
                            }
                        }
                    }
                }, {
                    "id": 430,
                    "name": "Data_9",
                    "queryId": 1,
                    "values": {
                        "Entreprise": {
                            "F1": {
                                "rule": 1437,
                                "value": "N"
                            }
                        }
                    }
                }, {
                    "id": 359,
                    "name": "Data_10",
                    "queryId": 2,
                    "values": {
                        "Entreprise": {
                            "F1": {
                                "rule": 876,
                                "value": "O"
                            }
                        }
                    }
                }
            ]
        }
    }
    
    
    import pandas as pd
    from pandas.io.json import json_normalize  # not needed for current pandas
    from pathlib import Path
    import json
    
    # load test.json
    p = Path(r'c:\Users\...\test.json')
    
    with p.open('r', encoding='utf-8') as f:
        data2 = json.loads(f.read())
    
    
    # create the dataframe
    # df = pd.json_normalize(data2, ['data', 'queries'])  # use for modern pandas
    df = json_normalize(data2, ['data', 'queries'])
    
    df.rename(columns={'values.Entreprise.F1.value': 'value'}, inplace=True)
    
    df2 = df[['name', 'value']].copy()
    
    print(df)
    
        id     name  queryId  values.Entreprise.F1.rule           value queries.name
    0  292   Data_8        0                       1450  1.000000000000          NaN
    1  430   Data_9        1                       1437               N          NaN
    2  359  Data_10        2                        876               O          NaN
    
    print(df2)
    
          name           value
    0   Data_8  1.000000000000
    1   Data_9               N
    2  Data_10               O