Lets say I have the following dataframe
df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
'gender': ['m', 'f','m','f','f'],
'grade':[2,3,1, 2,4]})
>>> print(df)
class name gender grade
0 A max1 m 2
1 A lisa1 f 3
2 A max3 m 1
3 B lisa2 f 2
4 B lisa3 f 4
I want to get the best student(s) of each class.(lower grade is better)
class grade name gender
0 A 1 max3 m
1 B 2 lisa2 f
How would I achieve that with pandas? Also if the best grade is not unique as in the above example how can I select the first occurrence of best/aggregate them to a list?
To clarify assume that lisa3 has a grade 2 instead of 4. Then the result I wish for would be:
class grade name gender
0 A 1 [max3] [m]
1 B 2 [lisa2, lisa3] [f, f]
You can use pandas.DataFrame.groupby
to group the students by each class and for each class you can get the minimum using transform
and then convert the rows that occurs more than once to list using pandas.Series.tolist
.
df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
'gender': ['m', 'f','m','f','f'],
'grade':[2,3,1, 2,2]})
temp = df[df.grade == df.groupby(['class'])['grade'].transform('min')]
temp.groupby(['class', 'grade']).agg(pd.Series.tolist)
This gives you the expected output :
name gender
class grade
A 1 [max3] [m]
B 2 [lisa2, lisa3] [f, f]
DataFrameGroupBy.transform
will basically give you the minimal value for each group and returns a DataFrame having the same indexes as the original object filled with the transformed values. In our case min
for each group is returned corresponding to each index value.
So df.groupby(['class'])['grade'].transform('min')
gives us
0 1
1 1
2 1
3 2
4 2
where 1
and 2
are the minimum values of their respective group, replicated for each row belonging to that group.