I am running into trouble using json_normalize when the record_path points to a column where there is a nested dict which then contains a list. See below for example.
Given the following:
list_of_dict = [
{
'SCHOOL_NAME': 'SCHOOL_A',
'STUDENTS': [
{
'STUDENT_NAME': 'JOHN',
'STUDENT_ID': '1'
},
{
'STUDENT_NAME': 'JANE',
'STUDENT_ID': '2'
},
]
},
{
'SCHOOL_NAME': 'SCHOOL_B',
'STUDENTS': [
{
'STUDENT_NAME': 'HENRY',
'STUDENT_ID': '1'
},
{
'STUDENT_NAME': 'MARK',
'STUDENT_ID': '2'
},
]
}]
I can flatten this using
pd.json_normalize(data=list_of_dict, record_path='STUDENTS', meta=['SCHOOL_NAME'])[['SCHOOL_NAME', 'STUDENT_ID', 'STUDENT_NAME']]
to obtain the following:
How can I obtain similar output format if the list_of_dict is structured as follows: ** NOTE THE ADDITION OF STUDENT_LIST **
list_of_dict = [
{
'SCHOOL_NAME': 'SCHOOL_A',
'STUDENT_LIST':{
'STUDENTS': [
{
'STUDENT_NAME': 'JOHN',
'STUDENT_ID': '1'
},
{
'STUDENT_NAME': 'JANE',
'STUDENT_ID': '2'
},
]
}
},
{
'SCHOOL_NAME': 'SCHOOL_B',
'STUDENT_LIST': {
'STUDENTS': [
{
'STUDENT_NAME': 'HENRY',
'STUDENT_ID': '1'
},
{
'STUDENT_NAME': 'MARK',
'STUDENT_ID': '2'
},
]
}
}]
Use dict comprehension
with pop
:
# Just pop key `STUDENT_LIST` and your list_of_dict is back like before
In [680]: a = [{**x, **x.pop('STUDENT_LIST')} for x in list_of_dict]
# Now use `json_normalize`
In [684]: pd.json_normalize(a, record_path='STUDENTS', meta=['SCHOOL_NAME'])
Out[684]:
STUDENT_NAME STUDENT_ID SCHOOL_NAME
0 JOHN 1 SCHOOL_A
1 JANE 2 SCHOOL_A
2 HENRY 1 SCHOOL_B
3 MARK 2 SCHOOL_B