Is there a possibility to find the index after a groupby. so in other words i use a groupby function to find max values, but i would like to find the corresponding index in the original data set and add this to a seperate column.
data= pd.DataFrame(
[[pd.Timestamp('2022-08-05 10:11:04'), 140, 120],
[pd.Timestamp('2022-08-05 10:11:05'), 400, 155],
[pd.Timestamp('2022-08-05 10:13:06'), 400, 160],
[pd.Timestamp('2022-08-05 10:15:07'), 100, 155],
[pd.Timestamp('2022-08-05 10:15:08'), 430, 160],
[pd.Timestamp('2022-09-05 10:17:09'), 430, 130],
[pd.Timestamp('2022-09-07 10:17:10'), 430, 131],
[pd.Timestamp('2022-09-07 10:17:11'), 430, 170],
[pd.Timestamp('2022-09-07 10:18:06'), 430, 156],
[pd.Timestamp('2022-09-07 10:19:07'), 130, 155],
[pd.Timestamp('2022-09-07 10:19:08'), 130, 160],
[pd.Timestamp('2022-09-07 10:19:09'), 430, 130],
[pd.Timestamp('2022-09-07 10:20:10'), 430, 131],
[pd.Timestamp('2022-09-07 10:20:11'), 130, 170]],
columns=['timestamp', 'power', 'heart rate'])
print(data)
so i want to know the max heart rate per date, with the corresponding index in the data dataframe
Use GroupBy.transform
with DataFrameGroupBy.idxmax
for indices by maximal value per group in column heart rate
:
data['new'] = data.groupby(data['timestamp'].dt.date)['heart rate'].transform('idxmax')
print (data)
timestamp power heart rate new
0 2022-08-05 10:11:04 140 120 2
1 2022-08-05 10:11:05 400 155 2
2 2022-08-05 10:13:06 400 160 2
3 2022-08-05 10:15:07 100 155 2
4 2022-08-05 10:15:08 430 160 2
5 2022-09-05 10:17:09 430 130 5
6 2022-09-07 10:17:10 430 131 7
7 2022-09-07 10:17:11 430 170 7
8 2022-09-07 10:18:06 430 156 7
9 2022-09-07 10:19:07 130 155 7
10 2022-09-07 10:19:08 130 160 7
11 2022-09-07 10:19:09 430 130 7
12 2022-09-07 10:20:10 430 131 7
13 2022-09-07 10:20:11 130 170 7
If need only maximal rows per groups:
df = data.loc[data.groupby(data['timestamp'].dt.date)['heart rate'].idxmax()]
print (df)
timestamp power heart rate
2 2022-08-05 10:13:06 400 160
5 2022-09-05 10:17:09 430 130
7 2022-09-07 10:17:11 430 170