Search code examples
pythonpandasgroup-bymax

Find index of max() after groupby


Is there a possibility to find the index after a groupby. so in other words i use a groupby function to find max values, but i would like to find the corresponding index in the original data set and add this to a seperate column.

data= pd.DataFrame(
    [[pd.Timestamp('2022-08-05 10:11:04'), 140, 120],
    [pd.Timestamp('2022-08-05 10:11:05'), 400, 155],
    [pd.Timestamp('2022-08-05 10:13:06'), 400, 160],
    [pd.Timestamp('2022-08-05 10:15:07'), 100, 155],
    [pd.Timestamp('2022-08-05 10:15:08'), 430, 160],
    [pd.Timestamp('2022-09-05 10:17:09'), 430, 130],
    [pd.Timestamp('2022-09-07 10:17:10'), 430, 131],
    [pd.Timestamp('2022-09-07 10:17:11'), 430, 170],
    [pd.Timestamp('2022-09-07 10:18:06'), 430, 156],
    [pd.Timestamp('2022-09-07 10:19:07'), 130, 155],
    [pd.Timestamp('2022-09-07 10:19:08'), 130, 160],
    [pd.Timestamp('2022-09-07 10:19:09'), 430, 130],
    [pd.Timestamp('2022-09-07 10:20:10'), 430, 131],
    [pd.Timestamp('2022-09-07 10:20:11'), 130, 170]],
    columns=['timestamp', 'power', 'heart rate'])
print(data)

so i want to know the max heart rate per date, with the corresponding index in the data dataframe


Solution

  • Use GroupBy.transform with DataFrameGroupBy.idxmax for indices by maximal value per group in column heart rate:

    data['new'] = data.groupby(data['timestamp'].dt.date)['heart rate'].transform('idxmax')
    print (data)
                 timestamp  power  heart rate  new
    0  2022-08-05 10:11:04    140         120    2
    1  2022-08-05 10:11:05    400         155    2
    2  2022-08-05 10:13:06    400         160    2
    3  2022-08-05 10:15:07    100         155    2
    4  2022-08-05 10:15:08    430         160    2
    5  2022-09-05 10:17:09    430         130    5
    6  2022-09-07 10:17:10    430         131    7
    7  2022-09-07 10:17:11    430         170    7
    8  2022-09-07 10:18:06    430         156    7
    9  2022-09-07 10:19:07    130         155    7
    10 2022-09-07 10:19:08    130         160    7
    11 2022-09-07 10:19:09    430         130    7
    12 2022-09-07 10:20:10    430         131    7
    13 2022-09-07 10:20:11    130         170    7
    

    If need only maximal rows per groups:

    df = data.loc[data.groupby(data['timestamp'].dt.date)['heart rate'].idxmax()]
    print (df)
                timestamp  power  heart rate
    2 2022-08-05 10:13:06    400         160
    5 2022-09-05 10:17:09    430         130
    7 2022-09-07 10:17:11    430         170