Search code examples
pythonpython-3.xdictionarydictionary-comprehension

How do I create nested dictionary from pandas data frame while adding numbers


I am trying to create a nested dictionary with the key as the office, then the remaining columns added within that office.

Should look something like this.

final_dict = {'YELLOW': {'Files Loaded': 21332, 'Files Assigned': 10613} 'RED':....}....

Current code is and I'm completely stuck on how to nest and add the values.

d = {'Office': ['Yellow','Yellow','Red', 'Red', 'Blue', 'Blue'], 'Files Loaded': [1223, 3062, 10, 100, 1520, 75], 'Files Assigned': [1223, 30, 1500, 10, 75, 12],
     'Files Analyzed': [1223, 15, 25, 34, 98, 1000], 'Discrepancies Identified': [17, 30, 150, 1456, 186, 1896]}

df = pd.DataFrame(data=d)

fields = ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified']

final_dict = df.groupby('Office')[fields].apply(list).to_dict()
print(final_dict)

{'Blue': ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified'], 'Red': ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified'], 'Yellow': ['Files Loaded', 'Files Assigned', 'Files Analyzed', 'Discrepancies Identified']}



Solution

  • With the following input:

    import pandas as pd
    from pprint import pprint
    
    d = {'Office': ['Yellow', 'Yellow', 'Red', 'Red', 'Blue', 'Blue'], 
         'Files Loaded': [1223, 3062, 10, 100, 1520, 75],
         'Files Assigned': [1223, 30, 1500, 10, 75, 12],
         'Files Analyzed': [1223, 15, 25, 34, 98, 1000], 
         'Discrepancies Identified': [17, 30, 150, 1456, 186, 1896]}
    df = pd.DataFrame(data=d)
    

    We can use the pandas groupby and aggregation (agg) function to sum up the totals per office. Then by using to_dict on 'index', we get the data provided as a dictionary, where the key is the Office and the values are a dictionary for which the key is the column name and the values are the aggregated count.

    data = df.groupby('Office').agg('sum')
    answer = data.to_dict('index')
    
    pprint(answer)
    

    Output:

    {'Blue': {'Discrepancies Identified': 2082,
              'Files Analyzed': 1098,
              'Files Assigned': 87,
              'Files Loaded': 1595},
     'Red': {'Discrepancies Identified': 1606,
             'Files Analyzed': 59,
             'Files Assigned': 1510,
             'Files Loaded': 110},
     'Yellow': {'Discrepancies Identified': 47,
                'Files Analyzed': 1238,
                'Files Assigned': 1253,
                'Files Loaded': 4285}}