I am trying to aggregate data that I query from a database into a specific format for a Grouped Bar Plot (ApexCharts.js).
Starting point:
all_records= [
{'fruit': 'Apple', 'var1': 1, 'var2': 2},
{'fruit': 'Apple', 'var1': 2, 'var2': 1},
{'fruit': 'Banana', 'var1': 1, 'var2': 3},
{'fruit': 'Cherry', 'var1': 0, 'var2': 1},
{'fruit': 'Cherry', 'var1': 4, 'var2': 0}
]
The required aggregation looks like this:
{'fruit': ['Apple', 'Banana', 'Cherry' ], 'var1': [ 3, 1, 4 ], 'var2': [ 3, 3, 1 ]}
To get to the desired result, I was trying to use
from collections import defaultdict
var1_dict = defaultdict(int)
for d in all_records:
var1_dict [d['fruit']] += d['var1']
print(var1_dict)
defaultdict(<class 'int'>, {'Apple': 3, 'Banana': 1, 'Cherry': 4 })
var2_dict= defaultdict(int)
for d in all_records:
var2_dict[d['fruit']] += d['var2']
print(var2_dict)
defaultdict(<class 'int'>, {'Apple': 3, 'Banana': 3, 'Cherry': 1 })
Then I could use list(var1_dict.keys())
to get ['Apple', 'Banana', 'Cherry']
which is one piece of the aggregation solved.
But from this point I honestly don't know how to proceed.
Here's a solution with pandas dataframes:
import pandas as pd
all_records= [
{'fruit': 'Apple', 'var1': 1, 'var2': 2},
{'fruit': 'Apple', 'var1': 2, 'var2': 1},
{'fruit': 'Banana', 'var1': 1, 'var2': 3},
{'fruit': 'Cherry', 'var1': 0, 'var2': 1},
{'fruit': 'Cherry', 'var1': 4, 'var2': 0}
]
df = pd.DataFrame(all_records).groupby('fruit').sum().reset_index()
print(df)
res = df.to_dict('list')
print(res)
Output:
{'fruit': ['Apple', 'Banana', 'Cherry'], 'var1': [3, 1, 4], 'var2': [3, 3, 1]}