Search code examples
pythonpandasdataframegroup-by

Get max of row in column in dataframes and keep information from different row


Hope some one can help me with this. I have several dataframes that contains a value a corresponding id and name. For each id I want the max value for all dataframes but I must also now for which name this max value occured.

The example only shows two dataframes but I really have about a 100 which also contains about a 100 id's so can't just take the max and then check which values are contained in one of the dataframes and take the name from there.

I have tried with groupby but can not get it to work.

import pandas as pd
d1 = pd.DataFrame()
d2 = pd.DataFrame()
d1['id'] = [1,2,3,4,5]
d2['id'] = [1,2,3,4,5]
d1['max'] = [10,20,30,40,50]
d2['max'] = 25
d1['name'] = 'name_1'
d2['name'] = 'name_2'
df_max_list=[d1, d2]
df_max_out=pd.concat(df_max_list).groupby(level=0).max()

Got the result:

    id  max name
0   1   25  name_2
1   2   25  name_2
2   3   30  name_2
3   4   40  name_2
4   5   50  name_2

Expected result:

    id  max name
0   1   25  name_2
1   2   25  name_2
2   3   30  name_1
3   4   40  name_1
4   5   50  name_1

Solution

  • You could use .idmax() to find the maximum value for max for each group.

    # Concat and reset index
    df_tot = pd.concat(df_max_list, ignore_index=True)
    # Groupby on id and select index for maximum value of 'max' for each group
    df_max_out = df_tot.loc[df_tot.groupby('id')['max'].idxmax()]
    

    output:

       id  max    name
    5   1   25  name_2
    6   2   25  name_2
    2   3   30  name_1
    3   4   40  name_1
    4   5   50  name_1