I have two pandas dataframes.
The Trade DataFrame has a list of entry and exit times for different trades
Trade# Entrytimestamp Exittimestamp
0 2003-01-02 10:30:00 2003-01-07 14:30:00
1 2003-01-07 13:35:00 2003-01-09 12:30:00
2 2003-01-13 14:30:00 2003-01-16 11:10:00
3 2003-01-14 09:50:00 2003-01-23 13:20:00
4 2003-01-17 13:30:00 2003-01-29 14:30:00
The tenmin DataFrame has a list of ten minute prices for the stock in question
timestamp price
2003-01-02 00:00:00 50.12
2003-01-02 00:10:00 50.15
2003-01-02 00:20:00 50.14
2003-01-02 00:30:00 50.13
2003-01-02 00:40:00 50.14
What I am trying to do is find the max and min prices during each trade to get some kind of result like this:
Trade# Entrytimestamp Exittimestamp IntraTradeMAX IntraTradeMIN
0 2003-01-02 10:30:00 2003-01-07 14:30:00 60.75 60.40
1 2003-01-07 13:35:00 2003-01-09 12:30:00 70.52 63.26
2 2003-01-13 14:30:00 2003-01-16 11:10:00 69.25 67.52
3 2003-01-14 09:50:00 2003-01-23 13:20:00 62.90 61.00
4 2003-01-17 13:30:00 2003-01-29 14:30:00 67.58 65.15
I know I need to just group the ten minute bar file by the entry and exit times for each trade and take the max, but I can't figure out how to do this without iterating through every trade individually. Seeing as my file has over 10,000 trades, doing this would take way too long. Any suggestions on how to do this in a pandas way?
If it helps, my current solution (too slow), is as follows.
def intramax(row):
trademax = ten['price'][np.logical_and(ten['timestamp'] > row['Entrytimestamp'], ten['timestamp'] < row['Exittimestamp'])].max()
return trademax
trade['IntraTradeMAX'] = trade.apply(intramax, axis=1)
I'm sure you could do even better with a better algo and there may be some clever way in pandas
I'm not thinking of, but a pretty naive numba
implementation seems to do pretty well.
In [110]: @numba.jit(nopython=True)
...: def intrastats(entry, exit_, ten_times, ten_prices):
...: N = len(entry)
...: K = len(ten_times)
...: ans = np.zeros((N, 2))
...: for i in range(N):
...: ans[i, 0] = -np.inf
...: ans[i, 1] = np.inf
...: for j in range(K):
...: if ten_times[j] < exit_[i] and ten_times[j] > entry[i]:
...: ans[i, 0] = max(ten_prices[j], ans[i, 0])
...: ans[i, 1] = min(ten_prices[j], ans[i, 1])
...: return ans
In [114]: %timeit intrastats(trade['Entrytimestamp'].values.view('int64'),
trade['Exittimestamp'].values.view('int64'),
ten['timestamp'].values.view('int64'),
ten['price'].values)
10000 loops, best of 3: 43.1 µs per loop
In [115]: %timeit trade.apply(intramax, axis=1)
100 loops, best of 3: 3.2 ms per loop