Search code examples
pythonjsonpandasnestednormalize

Put another fields with the nested JSON structure to pandas dataframe python


I am trying to put other fields from the json after normalize the nested part. I had followed some examples like I need but I don't understand why the error : KeyError: 'phone Number'

In others words I need put another fields from the json in order to complet the neested fields and get a better structure in my panda dataframe

This is my code :

import json
import pandas as pd

data ={
  "consumer": {
    "phoneNumber": "3156578877",
    "channelId": "83",
    "appId": "APP_DAVIPLATA",
    "moduleId": "MA_PSE_VNZ",
    "sessionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "appVersion": "16.1.1",
    "soVersion": "Android 11",
    "agentInfo": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
    "ipDevice": "163.111.221.230"
  },
  "transactionHeader": {
    "transactionType": "LOG",
    "transactionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "transactionDate": "2018-07-03T17:54:36.762-05:00"
  },
  "transactionDetail": {
    "logType": "ANALITICA_OPERACIONAL",
    "MediaTarjetId": "PSE_VNZ_1",
    "Consumer": {
      "phoneNumber": "3156578877",
      "channelId": "83",
      "appId": "APP_DAVIPLATA",
      "moduleId": "MA_PSE_VNZ",
      "sessionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "appVersion": "16.1.1",
      "soVersion": "Android 11",
      "agentInfo": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
      "ipDevice": "163.111.221.230"
    },
    "Transaction": {
      "transactionType": "LOG",
      "transactionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "transactionDate": "2018-07-03T17:54:36.762-05:00"
    },
    "Client": {
      "identificationType": "CC",
      "identificationNumber": "1027868487",
      "documentExpeditionDate": "2009-05-10",
      "documentExpeditionPlace": "Bogota"
    },
    "Product": {
      "productCode": "DVP_CO",
      "productNumber": "3158765639"
    },
    "Messages": {
      "OperationalAnalytics": [
        {
          "nameField": "fecha_transaccion",
          "valueField": "2022071518:35:50",
          "valueFormat": "YYYYMMDDHH:MM:SS"
        },
        {
          "nameField": "nombre_transaccion",
          "valueField": "DEBITO PAGO",
          "valueFormat": "String"
        },
        {
          "nameField": "valor",
          "valueField": "5300",
          "valueFormat": "Number"
        },
        {
          "nameField": "referencia_destino",
          "valueField": "3156547865",
          "valueFormat": "String"
        }
      ]
    }
  }
}

pd.json_normalize(data, record_path = ['transactionDetail','Messages','OperationalAnalytics'],meta = [['consumer','phoneNumber'], 'transactionHeader'])

KeyError: 'phoneNumber'

In this code some like I need it worked

import pandas as pd
data = [
    {
        "company": "Google",
        "tagline": "Dont be evil",
        "management": {"CEO": "Sundar Pichai"},
        "department": [
            {"name": "Gmail", "revenue (bn)": 123},
            {"name": "GCP", "revenue (bn)": 400},
            {"name": "Google drive", "revenue (bn)": 600},
        ],
    },
    {
        "company": "Microsoft",
        "tagline": "Be What's Next",
        "management": {"CEO": "Satya Nadella"},
        "department": [
            {"name": "Onedrive", "revenue (bn)": 13},
            {"name": "Azure", "revenue (bn)": 300},
            {"name": "Microsoft 365", "revenue (bn)": 300},
        ],
    },
  
]
result = pd.json_normalize(
    data, "department", ["company", "tagline", ["management", "CEO"]]
)
result

Thanks for your help


Solution

  • The json in your working example is a list of dictionaries while the one you are trying to transform is a dictionary.

    You can access the dictionary elements directly with data[key] and convert them to dataframes that you will concatenate... One tricky part of your example are the keys in 'OperationalAnalytics'. I guess you would want the nameField values to be column names so that you get one line per transaction. This can be achieved by setting the nameField column to index and transposing:

    pd.concat([pd.DataFrame([data["consumer"]])["phoneNumber"],
                pd.DataFrame([data["transactionHeader"]]),
                pd.DataFrame(data['transactionDetail']['Messages']['OperationalAnalytics']
                            ).set_index('nameField')[['valueField']].T.reset_index(drop=True)
              ], axis=1)
    

    Output:

      phoneNumber transactionType                         transactionId                transactionDate fecha_transaccion nombre_transaccion valor referencia_destino
    0  3156578877             LOG  3fa85f64-5717-4562-b3fc-2c963f66afa6  2018-07-03T17:54:36.762-05:00  2022071518:35:50        DEBITO PAGO  5300         3156547865
    

    Edit: to get the result of your linked picture, it's actually simpler (although I don't get why you'd want a transaction on several rows):

    pd.concat([pd.DataFrame([data["consumer"]])["phoneNumber"],
                pd.DataFrame([data["transactionHeader"]]),
                pd.DataFrame(data['transactionDetail']['Messages']['OperationalAnalytics'])[['nameField', 'valueField']]
              ], axis=1).ffill()