Search code examples
pythonpandasaggregate

How to aggregate and sum list of dictionaries for use in bar plot


I am trying to aggregate data that I query from a database into a specific format for a Grouped Bar Plot (ApexCharts.js).

Starting point:

all_records= [
  {'fruit': 'Apple', 'var1': 1, 'var2': 2}, 
  {'fruit': 'Apple', 'var1': 2, 'var2': 1}, 
  {'fruit': 'Banana', 'var1': 1, 'var2': 3}, 
  {'fruit': 'Cherry', 'var1': 0, 'var2': 1}, 
  {'fruit': 'Cherry', 'var1': 4, 'var2': 0} 
]

The required aggregation looks like this:

 {'fruit': ['Apple', 'Banana', 'Cherry' ], 'var1': [ 3, 1, 4 ], 'var2': [ 3, 3, 1 ]}

To get to the desired result, I was trying to use

from collections import defaultdict

var1_dict = defaultdict(int)
for d in all_records:
    var1_dict [d['fruit']] += d['var1']

print(var1_dict)
defaultdict(<class 'int'>, {'Apple': 3, 'Banana': 1, 'Cherry': 4 })

var2_dict= defaultdict(int)
for d in all_records:
    var2_dict[d['fruit']] += d['var2']

print(var2_dict)
defaultdict(<class 'int'>, {'Apple': 3, 'Banana': 3, 'Cherry': 1 })

Then I could use list(var1_dict.keys()) to get ['Apple', 'Banana', 'Cherry'] which is one piece of the aggregation solved.

But from this point I honestly don't know how to proceed.


Solution

  • Here's a solution with pandas dataframes:

    import pandas as pd
    
    all_records= [
      {'fruit': 'Apple', 'var1': 1, 'var2': 2},
      {'fruit': 'Apple', 'var1': 2, 'var2': 1},
      {'fruit': 'Banana', 'var1': 1, 'var2': 3},
      {'fruit': 'Cherry', 'var1': 0, 'var2': 1},
      {'fruit': 'Cherry', 'var1': 4, 'var2': 0}
    ]
    
    df = pd.DataFrame(all_records).groupby('fruit').sum().reset_index()
    print(df)
    
    res = df.to_dict('list')
    print(res)
    
    

    Output:

    {'fruit': ['Apple', 'Banana', 'Cherry'], 'var1': [3, 1, 4], 'var2': [3, 3, 1]}