I have data that looks like this:
system question answer grade rank
sys1 q1 a1 A 5
sys1 q1 a1 B 10
sys2 q1 a1 C 1
sys2 q1 a1 D 11
My goal is to group by question and answer, taking the best grade and rank, as well as the system associated with the best grade. Below is an example of what I'd like to execute:
>>df.groupby(['question', 'answer']).agg({'grade': np.min, 'rank': np.min, 'system': ???).reset_index()
question answer grade rank system
q1 a1 A 1 sys1
I'm getting what I want for grade and rank as I simply take the minimums. What's the best way to get the desired system?
I think you can use sort_values
first and then aggregate last
:
print (df)
system question answer grade rank
0 sys1 q1 a1 A 5
1 sys2 q1 a1 B 10
2 sys3 q1 a1 C 1
3 sys4 q1 a1 D 11
df = df.sort_values('grade')
.groupby(['question', 'answer'])
.agg({'grade': np.min, 'rank': np.min, 'system':'last'})
.reset_index()
print (df)
question answer system rank grade
0 q1 a1 sys4 1 A
I m not sure, but maybe need first
:
df = df.sort_values('grade')
.groupby(['question', 'answer'])
.agg({'grade': np.min, 'rank': np.min, 'system':'first'})
.reset_index()
print (df)
question answer system rank grade
0 q1 a1 sys1 1 A