I cannot get my head around how to use groupby to solve the following example:
df = pd.DataFrame(
{
'kitchen': ['galley', 'house', 'restaurant', 'caterer'] * 3,
'products': ['chocolate', 'tart', 'pie', ] * 4,
'menu_a': [pd.np.random.randint(100000, 999999) for _ in range(12)],
'menu_b': [pd.np.random.randint(100000, 999999) for _ in range(12)],
'menu_c': [pd.np.random.randint(100000, 999999) for _ in range(12)]
}
).set_index(['kitchen', 'products']).sort_index()
df
What I want to do is replace the "pie" and "tart" rows of each kitchen with the sum of pie+tart for each kitchen.
So for example, in the galley
kitchen
, the new row under products
would be pastries
and the value under menu_a
would be 333163+612456 = 945619
for each of the kitchen
x product
x menu
s.
I've tried many versions stack()
unstack()
and groupby()
mixed togther but cannot quite get the result. The alternative is to do this iteratively/apply()
'd outside, which is gross, and this is a frequent problem I encounter. Would like to know how to do it right.
Select rows by second level, sum and add second level:
df1 = (df.loc[pd.IndexSlice[:, ['pie','tart']], :]
.sum(level=0)
.assign(products='total')
.set_index('products', append=True))
Then concat to original and remove used values by list:
df = pd.concat([df, df1]).drop(['pie','tart'], level=1).sort_index()
print (df)
menu_a menu_b menu_c
kitchen products
caterer chocolate 907615 167480 921843
total 749664 786464 872046
galley chocolate 939850 382545 525525
total 1204359 907760 1267475
house chocolate 701797 106570 572014
total 1215235 1058951 812935
restaurant chocolate 734501 637600 216367
total 1846097 345020 517969