I'm relatively new to pandas, and I'm sure there is an easy solution, but I could not figure it out on my own. I have a dataframe of transactions that looks like this:
OrderId Size Price Side TimeSecO TimeUSecO TimeSecOT TimeUSecOT AmountBuy AmountSell
10 100 41.44000000 BUY 1403200077 47720 1403200100 640070
11 100 41.43000000 BUY 1403200077 47979 1403200112 43383
12 100 41.45000000 SELL 1403200077 48311 1403200090 61100
14 100 41.45000000 BUY 1403200092 253793 1403200092 374767
17 100 41.44000000 SELL 1403200103 24382 1403200125 929563
20 100 41.43000000 SELL 1403200116 208057 1403200116 226762
31 100 41.46000000 SELL 1403200214 874124 1403200259 751002
37 100 41.46000000 BUY 1403200278 494827 1403200300 729545
42 100 41.45000000 BUY 1403200335 601039 1403200361 925384
42 100 41.45000000 BUY 1403200335 601039 1403200361 925415
45 500 15.54000000 SELL 1403200365 997248 1403200741 26216
49 100 41.45000000 SELL 1403200375 419253 1403200402 959968
53 100 42.61000000 SELL 1403200377 403525 1403200377 403680
54 100 42.61000000 BUY 1403200377 501636 1403200377 501770
I want to calculate rolling cumulative sums for each OrderId and put them into 2 new columns with respect to Side column, CumAmountBuy and CumAmountSell, where TimeSecO > TimeSecOT.
For example, for the above dataframe the correct cumulative sums for OrderId 10, OrderId 11 and OrderId 12 would be CumAmountBuy = 0 and CumAmountSell = 0, because there are no records in the dataframe where 1403200077 > TimeUSecOT.
For OrderId 14, CumAmountBuy = 0, and CumAmountSell = 100, as OrderId 12 has already happened at this point, and it was a Side=SELL, and it fulfilled the requirement of TimeSecO > TimeSecOT (1403200092 > 1403200090).
I can think of a dirty trick but when the dataframe gets huge, I don't think it's efficient.
In [42]: df['flag'] = df.TimeSecO.map(lambda sec: (sec > df.TimeSecOT).values)
In [43]: df['CumAmountBuy'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='BUY')))
In [44]: df['CumAmountSell'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='SELL')))
In [45]: df[['CumAmountBuy','CumAmountSell']]
Out[45]:
CumAmountBuy CumAmountSell
OrderId
10 0 0
11 0 0
12 0 0
14 0 100
17 200 100
20 300 100
31 300 300
37 300 400
42 400 400
42 400 400
45 600 400
49 600 400
53 600 400
54 600 400