I've got a dataframe that looks like this (code is at the bottom, if you care):
A B C D E
Year Category
1980 Alpha 100 800 10 15 1.2350
Bravo 200 700 20 25 2.3550
Charlie 300 600 30 35 3.3600
Delta 400 500 40 45 4.4448
1981 Alpha 500 400 50 55 5.5000
Bravo 600 300 60 65 6.0000
Charlie 700 200 70 75 7.2000
Delta 800 100 80 85 8.0000
For each year, I'm trying to get the sum (and average) of columns A-D for all the rows where the value in E is larger than the median for that year.
So, for 1980 only Charlie & Delta qualify as having a value in column E that is greater than the median for that year. Hence the result for column A should be ...sum = 300 + 400 = 700 ... average = mean(300, 400) = 350
The closest I can get here is using groupby / median
df.groupby(['Year']).median()
...but then I'm not quite sure what to do with the outputs. This gives me the medians for each year but I can't then easily link back these results to my initial dataframe.
Dataframe code:
data = {'A': [100,200,300,400,500,600,700,800],
'B': [800,700,600,500,400,300,200,100],
'C': [10,20,30,40,50,60,70,80],
'D': [15,25,35,45,55,65,75,85],
'E': [1.235,2.355,3.36,4.4448,5.5,6,7.2,8]}
idx = pd.MultiIndex.from_product([[1980, 1981],['Alpha','Bravo', 'Charlie','Delta']], names=['Year','Category'])
df = pd.DataFrame(data, index=idx, columns=['A', 'B', 'C', 'D', 'E'])
IIUC, you can do a groupby
on the first level and transform median
on column E , then compare and drop off rows where E exceeds the median and sum on level=0:
s = df['E'].gt(df.groupby(level=0)['E'].transform('median'))
df.drop('E',1)[s].sum(level=0)
A B C D
Year
1980 700 1100 70 80
1981 1500 300 150 160
For having mean and sum stacked side by side you can do the same with concat and add_suffix
to identify:
s = df['E'].gt(df.groupby(level=0)['E'].transform('median'))
m = df.drop('E',1)[s]
pd.concat((m.sum(level=0).add_suffix('_sum'),m.mean(level=0).add_suffix('_avg')),axis=1)
A_sum B_sum C_sum D_sum A_avg B_avg C_avg D_avg
Year
1980 700 1100 70 80 350 550 35 40
1981 1500 300 150 160 750 150 75 80