Search code examples
pythongroup-by

How to conserve all the columns in a groupby in pandas?


I have a df with some columns and I want to group the information in a column but keep the rest, specialy because I want to get the maximum value.

ID academic_level sex location
1 9 1 3
1 1 2 3
2 5 1 4
2 7 2 4

I would like to have the following:

ID academic_level sex location
1 9 1 3
2 7 2 4

I mean to group the id and get the maximum academic value and keep the rest of the variables

Thanks


Solution

  • You can use idxmax(), loc and groupby to get your desired result:

    import pandas as pd
    
    df = pd.DataFrame({'ID': [1, 1, 2, 2],
                       'academic_level': [9, 1, 5, 7],
                       'sex': [1, 2, 1, 2],
                       'location': [3, 3, 4, 4]})
    
    df_result = df.loc[df.groupby('ID')['academic_level'].idxmax()]
    

    df_result holds:

       ID  academic_level  sex  location
    0   1               9    1         3
    3   2               7    2         4