Hope some one can help me with this. I have several dataframes that contains a value a corresponding id and name. For each id I want the max value for all dataframes but I must also now for which name this max value occured.
The example only shows two dataframes but I really have about a 100 which also contains about a 100 id's so can't just take the max and then check which values are contained in one of the dataframes and take the name from there.
I have tried with groupby but can not get it to work.
import pandas as pd
d1 = pd.DataFrame()
d2 = pd.DataFrame()
d1['id'] = [1,2,3,4,5]
d2['id'] = [1,2,3,4,5]
d1['max'] = [10,20,30,40,50]
d2['max'] = 25
d1['name'] = 'name_1'
d2['name'] = 'name_2'
df_max_list=[d1, d2]
df_max_out=pd.concat(df_max_list).groupby(level=0).max()
Got the result:
id max name
0 1 25 name_2
1 2 25 name_2
2 3 30 name_2
3 4 40 name_2
4 5 50 name_2
Expected result:
id max name
0 1 25 name_2
1 2 25 name_2
2 3 30 name_1
3 4 40 name_1
4 5 50 name_1
You could use .idmax()
to find the maximum value for max for each group.
# Concat and reset index
df_tot = pd.concat(df_max_list, ignore_index=True)
# Groupby on id and select index for maximum value of 'max' for each group
df_max_out = df_tot.loc[df_tot.groupby('id')['max'].idxmax()]
output:
id max name
5 1 25 name_2
6 2 25 name_2
2 3 30 name_1
3 4 40 name_1
4 5 50 name_1