I have a large data frame which shows similar as follows:
price type status
2 shoes none
3 clothes none
6 clothes none
3 shoes none
4 shoes none
6 shoes none
2 clothes none
3 shoes none
6 clothes none
8 clothes done
Basically, I want to calculate the mean and median based on "type" whenever the "status" is written done. So far what I have done is made a group first based on the status "done", then I calculate the mean and median of the group like the script below:
g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
grouper = df.groupby(g)
df_statistics = grouper.agg(
mean = ('price', 'mean')
,median = ('price', 'median')
df_freq = df.groupby(g).apply(lambda x: x['price'].value_counts().idxmax())
How can I add one more parameter for the "type", so the script will estimate the median of each group based on "type" also.
I think you need pass column name to list and then to groupby
grouper = df.groupby([g, 'type'])