Search code examples
pythonlistdataframedictionarynested

how do I create dataframe from list of nested dictionary


I have the following list with nested dictionary and I would like to convert it to a dataframe or table.

dict_items = [('Grant June_GR-0483',
               [{'Chem': {'Quiz': 127.0, 'HW': 135.0, 'ATTND': 17.0, 'Exam': 46.0}},
                {'Bio': {'Quiz': 184.0, 'HW': 186.0, 'ATTND': 58.0, 'Exam': 97.0}},
                {'Phy': {'Quiz': 52.0, 'HW': 142.0, 'ATTND': 29.0, 'Exam': 73.0}},
                {'Subject Scores': [52.615526315789474,
                                    92.69684210526316,
                                    54.436052631578946],
                 'Overall Score': 66.58,
                 'Grade': 'F',
                 'GPA': 3.3289999999999997,
                 'Status': 'Fail'}]),
              ('Dave',
               [{'Chem': {'Quiz': 23.0, 'HW': 55.0, 'ATTND': 67.0, 'Exam': 43.0}},
                {'Subject Scores': [34.83868421052631],
                 'Overall Score': 34.84,
                 'Grade': 'F',
                 'GPA': 1.7420000000000002,
                 'Status': 'Fail'}]),
              ('Ben',
               [{'Chem': {'Quiz': 23.0, 'HW': 34.0, 'ATTND': 56.0, 'Exam': 67.0}},
                {'Subject Scores': [42.125789473684215],
                 'Overall Score': 42.13,
                 'Grade': 'F',
                 'GPA': 2.1065,
                 'Status': 'Fail'}])]

I tried this but it does not give me the columns as I want.

import pandas as pd
perfList
tabl = pd.DataFrame(perfList)

Desire Output:

My output

Table should look like this


Solution

  • Here, this code will guide you for further

    Code:

    l = [('Grant June_GR-0483', [{'Chem': {'Quiz': 127.0, 'HW': 135.0, 'ATTND': 17.0, 'Exam': 46.0}}, {'Bio': {'Quiz': 184.0, 'HW': 186.0, 'ATTND': 58.0, 'Exam': 97.0}}, {'Phy': {'Quiz': 52.0, 'HW': 142.0, 'ATTND': 29.0, 'Exam': 73.0}}, {'Subject Scores': [52.615526315789474, 92.69684210526316, 54.436052631578946], 'Overall Score': 66.58, 'Grade': 'F', 'GPA': 3.3289999999999997, 'Status': 'Fail'}]), ('Dave', [{'Chem': {'Quiz': 23.0, 'HW': 55.0, 'ATTND': 67.0, 'Exam': 43.0}}, {'Subject Scores': [34.83868421052631], 'Overall Score': 34.84, 'Grade': 'F', 'GPA': 1.7420000000000002, 'Status': 'Fail'}]), ('Ben', [{'Chem': {'Quiz': 23.0, 'HW': 34.0, 'ATTND': 56.0, 'Exam': 67.0}}, {'Subject Scores': [42.125789473684215], 'Overall Score': 42.13, 'Grade': 'F', 'GPA': 2.1065, 'Status': 'Fail'}])]
    
    data = pd.DataFrame()
    indx = pd.json_normalize(dict(l)).columns
    
    for i in indx:
        df = pd.json_normalize(dict(l), record_path=i)
        
        df.index = [i]*len(df)
        df = df.mask(df.eq('None')|df.eq(None)).groupby(df.index).first()
        
        
        data=pd.concat([data, df], axis=0).fillna(0)
    
    lst = data.columns.str.split('.', expand=True).values
    data.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else (x[1],x[0])  for x in lst])
    

    Output:

    enter image description here