Python: Running maximum by another column?

I have a dataframe like this, which tracks the value of certain items (ids) over time:

mytime=np.tile( np.arange(0,10) , 2 )
myids=np.repeat( [123,456], [10,10] )
myvalues=np.random.random_integers(20,30,10*2)

df=pd.DataFrame()
df['myids']=myids
df['mytime']=mytime
df['myvalues']=myvalues



+-------+--------+----------+--+--+
| myids | mytime | myvalues |  |  |
+-------+--------+----------+--+--+
| 123   | 0      | 29       |  |  |
+-------+--------+----------+--+--+
| 123   | 1      | 23       |  |  |
+-------+--------+----------+--+--+
| 123   | 2      | 26       |  |  |
+-------+--------+----------+--+--+
| 123   | 3      | 24       |  |  |
+-------+--------+----------+--+--+
| 123   | 4      | 25       |  |  |
+-------+--------+----------+--+--+
| 123   | 5      | 29       |  |  |
+-------+--------+----------+--+--+
| 123   | 6      | 28       |  |  |
+-------+--------+----------+--+--+
| 123   | 7      | 21       |  |  |
+-------+--------+----------+--+--+
| 123   | 8      | 20       |  |  |
+-------+--------+----------+--+--+
| 123   | 9      | 26       |  |  |
+-------+--------+----------+--+--+
| 456   | 0      | 26       |  |  |
+-------+--------+----------+--+--+
| 456   | 1      | 24       |  |  |
+-------+--------+----------+--+--+
| 456   | 2      | 20       |  |  |
+-------+--------+----------+--+--+
| 456   | 3      | 26       |  |  |
+-------+--------+----------+--+--+
| 456   | 4      | 29       |  |  |
+-------+--------+----------+--+--+
| 456   | 5      | 29       |  |  |
+-------+--------+----------+--+--+
| 456   | 6      | 24       |  |  |
+-------+--------+----------+--+--+
| 456   | 7      | 21       |  |  |
+-------+--------+----------+--+--+
| 456   | 8      | 27       |  |  |
+-------+--------+----------+--+--+
| 456   | 9      | 29       |  |  |
+-------+--------+----------+--+--+

I'd need to calculate the running maximum for each id.

np.maximum.accumulate()

would calculate the running maximum regardless of id, whereas I need a similar calculation, which however resets every time the id changes. I can think of a simple script to do it in numba (I have very large arrays and non-vectorised non-numba code would be slow), but is there an easier way to do it?

With just two values I can run:

df['running max']= np.hstack((  np.maximum.accumulate(df[ df['myids']==123 ]['myvalues']) , np.maximum.accumulate(df[ df['myids']==456 ]['myvalues']) )  )

but this is not feasible with lots and lots of values.

Solution

Here you go. Assumption is mytime is sorted.

mytime=np.tile( np.arange(0,10) , 2 )
myids=np.repeat( [123,456], [10,10] )
myvalues=np.random.random_integers(20,30,10*2)

df=pd.DataFrame()
df['myids']=myids
df['mytime']=mytime
df['myvalues']=myvalues

groups = df.groupby('myids')
df['run_max_group'] = groups['myvalues'].transform(np.maximum.accumulate)

Output...

    myids  mytime  myvalues  run_max_group
0     123       0        27             27
1     123       1        21             27
2     123       2        24             27
3     123       3        25             27
4     123       4        22             27
5     123       5        20             27
6     123       6        20             27
7     123       7        30             30
8     123       8        24             30
9     123       9        22             30
10    456       0        29             29
11    456       1        23             29
12    456       2        30             30
13    456       3        28             30
14    456       4        26             30
15    456       5        25             30
16    456       6        28             30
17    456       7        27             30
18    456       8        20             30
19    456       9        24             30