I am trying to put other fields from the json after normalize the nested part. I had followed some examples like I need but I don't understand why the error : KeyError: 'phone Number'
In others words I need put another fields from the json in order to complet the neested fields and get a better structure in my panda dataframe
This is my code :
import json
import pandas as pd
data ={
"consumer": {
"phoneNumber": "3156578877",
"channelId": "83",
"appId": "APP_DAVIPLATA",
"moduleId": "MA_PSE_VNZ",
"sessionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"appVersion": "16.1.1",
"soVersion": "Android 11",
"agentInfo": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
"ipDevice": "163.111.221.230"
},
"transactionHeader": {
"transactionType": "LOG",
"transactionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"transactionDate": "2018-07-03T17:54:36.762-05:00"
},
"transactionDetail": {
"logType": "ANALITICA_OPERACIONAL",
"MediaTarjetId": "PSE_VNZ_1",
"Consumer": {
"phoneNumber": "3156578877",
"channelId": "83",
"appId": "APP_DAVIPLATA",
"moduleId": "MA_PSE_VNZ",
"sessionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"appVersion": "16.1.1",
"soVersion": "Android 11",
"agentInfo": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
"ipDevice": "163.111.221.230"
},
"Transaction": {
"transactionType": "LOG",
"transactionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"transactionDate": "2018-07-03T17:54:36.762-05:00"
},
"Client": {
"identificationType": "CC",
"identificationNumber": "1027868487",
"documentExpeditionDate": "2009-05-10",
"documentExpeditionPlace": "Bogota"
},
"Product": {
"productCode": "DVP_CO",
"productNumber": "3158765639"
},
"Messages": {
"OperationalAnalytics": [
{
"nameField": "fecha_transaccion",
"valueField": "2022071518:35:50",
"valueFormat": "YYYYMMDDHH:MM:SS"
},
{
"nameField": "nombre_transaccion",
"valueField": "DEBITO PAGO",
"valueFormat": "String"
},
{
"nameField": "valor",
"valueField": "5300",
"valueFormat": "Number"
},
{
"nameField": "referencia_destino",
"valueField": "3156547865",
"valueFormat": "String"
}
]
}
}
}
pd.json_normalize(data, record_path = ['transactionDetail','Messages','OperationalAnalytics'],meta = [['consumer','phoneNumber'], 'transactionHeader'])
KeyError: 'phoneNumber'
In this code some like I need it worked
import pandas as pd
data = [
{
"company": "Google",
"tagline": "Dont be evil",
"management": {"CEO": "Sundar Pichai"},
"department": [
{"name": "Gmail", "revenue (bn)": 123},
{"name": "GCP", "revenue (bn)": 400},
{"name": "Google drive", "revenue (bn)": 600},
],
},
{
"company": "Microsoft",
"tagline": "Be What's Next",
"management": {"CEO": "Satya Nadella"},
"department": [
{"name": "Onedrive", "revenue (bn)": 13},
{"name": "Azure", "revenue (bn)": 300},
{"name": "Microsoft 365", "revenue (bn)": 300},
],
},
]
result = pd.json_normalize(
data, "department", ["company", "tagline", ["management", "CEO"]]
)
result
Thanks for your help
The json in your working example is a list of dictionaries while the one you are trying to transform is a dictionary.
You can access the dictionary elements directly with data[key]
and convert them to dataframes that you will concatenate... One tricky part of your example are the keys in 'OperationalAnalytics'. I guess you would want the nameField
values to be column names so that you get one line per transaction. This can be achieved by setting the nameField
column to index and transposing:
pd.concat([pd.DataFrame([data["consumer"]])["phoneNumber"],
pd.DataFrame([data["transactionHeader"]]),
pd.DataFrame(data['transactionDetail']['Messages']['OperationalAnalytics']
).set_index('nameField')[['valueField']].T.reset_index(drop=True)
], axis=1)
Output:
phoneNumber transactionType transactionId transactionDate fecha_transaccion nombre_transaccion valor referencia_destino
0 3156578877 LOG 3fa85f64-5717-4562-b3fc-2c963f66afa6 2018-07-03T17:54:36.762-05:00 2022071518:35:50 DEBITO PAGO 5300 3156547865
Edit: to get the result of your linked picture, it's actually simpler (although I don't get why you'd want a transaction on several rows):
pd.concat([pd.DataFrame([data["consumer"]])["phoneNumber"],
pd.DataFrame([data["transactionHeader"]]),
pd.DataFrame(data['transactionDetail']['Messages']['OperationalAnalytics'])[['nameField', 'valueField']]
], axis=1).ffill()