Search code examples
pythonjsonpandasjson-normalize

How to normalise complex JSON with multiple levels of nested information in Python


I get the data in a predefined format I cannot change:

data = 
{
  "id": 12345,
  "name": "Doe",
  "gender": {
    "textEn": "Masculin"
  },
  "professions": [
    {
      "job_description": {
        "textEn": "Job description"
      },
      "cetTitles": [
        {
          "cetTitleType": {
            "textEn": "Recognition"
          },
          "issuanceDate": "1992-04-14T00:00:00Z",
          "phoneNumbers": [
            "123 221 00 70"
          ]
        }
      ]
    }
  ]
}

While I can normalise data one level down ('gender') through the pd.json_normalize function, I struggle to access info further down in the hierarchy.

I tried e.g. getting the job description through data = pd.json_normalize(data,record_path=['professions','job_description'],meta='id') but I get a TypeError. How can I address this and extract all data into a table so the data above is represented as one row with all content as fields?

The output I would like to ultimately achieve looks as follows:

id name gender job_description cetTitleType issuanceDate phoneNumbers
12345 Doe Masculin Job description Recognition 1992-04-14T00:00:00Z 123 221 00 70

Solution

  • Using json_normalize():

    Code:

    df = pd.json_normalize(
        data=data,
        record_path=["professions", "cetTitles"],
        meta=["id", "name", "gender", ["professions", "job_description"]]
    ).explode(column="phoneNumbers")
    
    df = df.join(pd.DataFrame(df.pop("gender").values.tolist()))
    df.columns = df.columns.str.split(".").str[-1]
    df = df.rename(columns={"textEn": "gender"})
    df = df.join(pd.DataFrame(df.pop("job_description").values.tolist()))
    df = df.rename(columns={"textEn": "job_description"})
    print(df)
    

    Output:

               issuanceDate   phoneNumbers       gender     id name    gender  job_description
    0  1992-04-14T00:00:00Z  123 221 00 70  Recognition  12345  Doe  Masculin  Job description