I have a hopefully easy problem for some help stack helpers! I have a dataframe:
df = pd.DataFrame({'Quantity': [2, 3, 4, 1, 2, 1, 4, 5],
'User': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'Price': [5, 3, 2, 6, 2, 3, 4, 5],
'Shop': ['X', 'X', 'X', 'Y', 'Z', 'Z', 'X', 'Y'],
'Day': ['M', 'T', 'W', 'W', 'M', 'T', 'M', 'W']
})
| QuantityUser Price Shop Day
0 2 A 5 X M
1 3 A 3 X T
2 4 B 2 X W
3 1 B 6 Y W
4 2 B 2 Z M
5 1 C 3 Z T
6 4 C 4 X M
7 5 C 5 Y W
My trouble comes when I try and aggregate it by shop and day. I'm hoping for the users in a shop by day and the average spent in that shop on that day. So in SQL it would be: AVG(Quantity*Price)
I have the first part:
df.groupby(by=['Shop','Day']).agg({'User': 'count'})
But my only solution to the other aggregation is first create a column and then aggregate it.
df['Spend'] = df['Price'] * df['Quantity']
df.groupby(by=['Shop','Day']).agg({'User': 'count' ,'Spend' :'mean' })
Is there a better method I am missing? Ideally I want the aggregation to happen alongside the Count
aggregate without the need for a new column created.
You can achieve that without creating new column like so:
result = df.groupby(by=['Shop', 'Day']).agg(
User=('User', 'count'),
Spend=('Price', lambda x: (x * df.loc[x.index, 'Quantity']).mean())
)
Output is the same as yours:
User Spend
Shop Day
X M 2 13.0
T 1 9.0
W 1 8.0
Y W 2 15.5
Z M 1 4.0
T 1 3.0