json python-3.x pandas dataframe json-normalize

How to read in pandas dataframe complex JSON files containing many down levels

I already browsed Stack Overflow and many sites, but I could not find a solution resolving my problem. I am trying to convert a JSON file received from a multi-dimensionnal inference engine (AI), to an Excel format through a pandas dataframe. This JSON file has a quite complex structure with many levels down (I am not familiar with JSON language).

Here is the JSON format:

    {
"data": {
    "queries": [{
            "id": 292,
            "**name**": "Data_8",
            "queryId": 0,
            "values": {
                "Entreprise": {
                    "F1": {
                        "rule": 1450,
                        "**value**": "1.000000000000"
                    }
                }
            }
        }, {
            "id": 430,
            "name": "Data_9",
            "queryId": 1,
            "values": {
                "Entreprise": {
                    "F1": {
                        "rule": 1437,
                        "value": "N"
                    }
                }
            }
        }, {
            "id": 359,
            "**name**": "Data_10",
            "queryId": 2,
            "values": {
                "Entreprise": {
                    "F1": {
                        "rule": 876,
                        "**value**": "O"
                    }
                }
            }
        }, and so on.

I load this file with:

    with open(output_JSON) as data_file:    
         data2= json.load(data_file)

From the file, I need to retrieve only two fields: name and the correspondant value, in two columns.

Desired output of the dataframe is :

              name            value
   0          Data_8            1
   1          Data_9            N
   2          Data_10           O

Thank you for your time and your help.

Anyway, take care and stay safe @ home.

Greetings from Paris, France :)

UPDATE (2/05/2020): File is loaded with :

with open(output_JSON) as data_file:    
             data2= json.load(data_file)

then:

df = pd.DataFrame(data2['data'])
print (df)

returns:

                                              queries
0   {'id': 292, 'name': 'Data_8', 'queryId': 0, 'v...
1   {'id': 430, 'name': 'Data_9', 'queryId': 1, 'v...
2   {'id': 359, 'name': 'Data_10', 'queryId': 2, '...

then:

df2=df["queries"]
print (df2)

returns quite the same:

0     {'id': 292, 'name': 'Data_8', 'queryId': 0, 'v...
1     {'id': 430, 'name': 'Data_9', 'queryId': 1, 'v...
2     {'id': 359, 'name': 'Data_10', 'queryId': 2, '...

At this point I dont know how to get name and value columns in a dataframe.

Thank you again for your time :)

Solution

Use `json_normalize`

If the pandas version is < 1, use from pandas.io.json import json_normalize

`data2`

This is the data I have in a file called test.json

{
    "data": {
        "queries": [{
                "id": 292,
                "name": "Data_8",
                "queryId": 0,
                "values": {
                    "Entreprise": {
                        "F1": {
                            "rule": 1450,
                            "value": "1.000000000000"
                        }
                    }
                }
            }, {
                "id": 430,
                "name": "Data_9",
                "queryId": 1,
                "values": {
                    "Entreprise": {
                        "F1": {
                            "rule": 1437,
                            "value": "N"
                        }
                    }
                }
            }, {
                "id": 359,
                "name": "Data_10",
                "queryId": 2,
                "values": {
                    "Entreprise": {
                        "F1": {
                            "rule": 876,
                            "value": "O"
                        }
                    }
                }
            }
        ]
    }
}

import pandas as pd
from pandas.io.json import json_normalize  # not needed for current pandas
from pathlib import Path
import json

# load test.json
p = Path(r'c:\Users\...\test.json')

with p.open('r', encoding='utf-8') as f:
    data2 = json.loads(f.read())


# create the dataframe
# df = pd.json_normalize(data2, ['data', 'queries'])  # use for modern pandas
df = json_normalize(data2, ['data', 'queries'])

df.rename(columns={'values.Entreprise.F1.value': 'value'}, inplace=True)

df2 = df[['name', 'value']].copy()

print(df)

    id     name  queryId  values.Entreprise.F1.rule           value queries.name
0  292   Data_8        0                       1450  1.000000000000          NaN
1  430   Data_9        1                       1437               N          NaN
2  359  Data_10        2                        876               O          NaN

print(df2)

      name           value
0   Data_8  1.000000000000
1   Data_9               N
2  Data_10               O

How to read in pandas dataframe complex JSON files containing many down levels

Use json_normalize

data2

Use `json_normalize`

`data2`