Given this pandas dataframe with three columns, 'room_id', 'temperature' and 'State'. How do I get a forth column 'Max' indicating wehn the value is a maximum for each interval where State is True and for each room ?
117 1.489000 True
8.9 False
2.5 False
4.370000 False
4.363333 True
4.356667 True
118 4.35 True
6.648000 True
6.642667 True
7.3 False
9.4 False
5.3 True
7.1 True
What I am expecting
117 1.489000 True max
8.9 False
2.5 False
4.370000 False
4.363333 True max
4.356667 True
118 4.35 True
6.648000 True max
6.642667 True
7.3 False
9.4 False
5.3 True
7.1 True max
I used this :
Max = df_state.groupby(masque.cumsum()[~masque])['temperature'].agg(['idxmax'])
But I found this :
117 1.489000 True max
8.9 False
2.5 False
4.370000 False
4.363333 True
4.356667 True
118 4.35 True
6.648000 True max
6.642667 True
7.3 False
9.4 False
5.3 True
7.1 True max
I miss the last max of room 117 because the algorithm does not take into account the room id
You can use groupby.idxmax
to get the index of the max per custom group:
# get the indices of the max value per group
idx = (df[df['State']].groupby(['room_id', (~df['State']).cumsum()])
['temperature'].idxmax()
)
# assign the new value
df.loc[idx, 'max_temp'] = 'max'
If you want the temperature value instead of a literax 'max'
:
df.loc[idx, 'max'] = df.loc[idx, 'temperature']
Output:
room_id temperature State max max_temp
0 117 1.489000 True max 1.489000
1 117 8.900000 False NaN NaN
2 117 2.500000 False NaN NaN
3 117 4.370000 False NaN NaN
4 117 4.363333 True max 4.363333
5 117 4.356667 True NaN NaN
6 118 4.350000 True NaN NaN
7 118 6.648000 True max 6.648000
8 118 6.642667 True NaN NaN
9 118 7.300000 False NaN NaN
10 118 9.400000 False NaN NaN
11 118 5.300000 True NaN NaN
12 118 7.100000 True max 7.100000