I have a df with some columns and I want to group the information in a column but keep the rest, specialy because I want to get the maximum value.
ID | academic_level | sex | location |
---|---|---|---|
1 | 9 | 1 | 3 |
1 | 1 | 2 | 3 |
2 | 5 | 1 | 4 |
2 | 7 | 2 | 4 |
I would like to have the following:
ID | academic_level | sex | location |
---|---|---|---|
1 | 9 | 1 | 3 |
2 | 7 | 2 | 4 |
I mean to group the id and get the maximum academic value and keep the rest of the variables
Thanks
You can use idxmax()
, loc
and groupby
to get your desired result:
import pandas as pd
df = pd.DataFrame({'ID': [1, 1, 2, 2],
'academic_level': [9, 1, 5, 7],
'sex': [1, 2, 1, 2],
'location': [3, 3, 4, 4]})
df_result = df.loc[df.groupby('ID')['academic_level'].idxmax()]
df_result
holds:
ID academic_level sex location
0 1 9 1 3
3 2 7 2 4