Search code examples
python-3.xpandasjson-normalize

Using json_normalize to build table from nested values


I am running into trouble using json_normalize when the record_path points to a column where there is a nested dict which then contains a list. See below for example.

Given the following:

list_of_dict = [
{
    'SCHOOL_NAME': 'SCHOOL_A',
    'STUDENTS': [
        {
            'STUDENT_NAME': 'JOHN',
            'STUDENT_ID': '1'
        },
        {
            'STUDENT_NAME': 'JANE',
            'STUDENT_ID': '2'
        },
    ]
},
{
    'SCHOOL_NAME': 'SCHOOL_B',
    'STUDENTS': [
        {
            'STUDENT_NAME': 'HENRY',
            'STUDENT_ID': '1'
        },
        {
            'STUDENT_NAME': 'MARK',
            'STUDENT_ID': '2'
        },
    ]
}]

I can flatten this using

pd.json_normalize(data=list_of_dict, record_path='STUDENTS', meta=['SCHOOL_NAME'])[['SCHOOL_NAME', 'STUDENT_ID', 'STUDENT_NAME']]

to obtain the following:

Expected Output

How can I obtain similar output format if the list_of_dict is structured as follows: ** NOTE THE ADDITION OF STUDENT_LIST **

list_of_dict = [
{
    'SCHOOL_NAME': 'SCHOOL_A',
    'STUDENT_LIST':{
        'STUDENTS': [
            {
                'STUDENT_NAME': 'JOHN',
                'STUDENT_ID': '1'
            },
            {
                'STUDENT_NAME': 'JANE',
                'STUDENT_ID': '2'
            },
        ]
    }

},
{
    'SCHOOL_NAME': 'SCHOOL_B',
    'STUDENT_LIST': {
        'STUDENTS': [
            {
                'STUDENT_NAME': 'HENRY',
                'STUDENT_ID': '1'
            },
            {
                'STUDENT_NAME': 'MARK',
                'STUDENT_ID': '2'
            },
        ]
    }
}]

Solution

  • Use dict comprehension with pop:

    # Just pop key `STUDENT_LIST` and your list_of_dict is back like before
    In [680]: a = [{**x, **x.pop('STUDENT_LIST')} for x in list_of_dict]
    
    # Now use `json_normalize`
    In [684]: pd.json_normalize(a, record_path='STUDENTS', meta=['SCHOOL_NAME'])
    Out[684]: 
      STUDENT_NAME STUDENT_ID SCHOOL_NAME
    0         JOHN          1    SCHOOL_A
    1         JANE          2    SCHOOL_A
    2        HENRY          1    SCHOOL_B
    3         MARK          2    SCHOOL_B