Search code examples
pythonpandasmergegroupingtrading

Grouping/mapping using values from a different DataFrame as the mapping function PYTHON PANDAS


I have two pandas dataframes.

The Trade DataFrame has a list of entry and exit times for different trades

Trade#  Entrytimestamp         Exittimestamp 
0 2003-01-02 10:30:00  2003-01-07 14:30:00
1 2003-01-07 13:35:00  2003-01-09 12:30:00
2 2003-01-13 14:30:00  2003-01-16 11:10:00
3 2003-01-14 09:50:00  2003-01-23 13:20:00
4 2003-01-17 13:30:00  2003-01-29 14:30:00

The tenmin DataFrame has a list of ten minute prices for the stock in question

timestamp             price
2003-01-02 00:00:00     50.12    
2003-01-02 00:10:00     50.15
2003-01-02 00:20:00     50.14
2003-01-02 00:30:00     50.13
2003-01-02 00:40:00     50.14

What I am trying to do is find the max and min prices during each trade to get some kind of result like this:

Trade#  Entrytimestamp         Exittimestamp   IntraTradeMAX  IntraTradeMIN
0 2003-01-02 10:30:00  2003-01-07 14:30:00     60.75          60.40
1 2003-01-07 13:35:00  2003-01-09 12:30:00     70.52          63.26
2 2003-01-13 14:30:00  2003-01-16 11:10:00     69.25          67.52
3 2003-01-14 09:50:00  2003-01-23 13:20:00     62.90          61.00
4 2003-01-17 13:30:00  2003-01-29 14:30:00     67.58          65.15

I know I need to just group the ten minute bar file by the entry and exit times for each trade and take the max, but I can't figure out how to do this without iterating through every trade individually. Seeing as my file has over 10,000 trades, doing this would take way too long. Any suggestions on how to do this in a pandas way?

If it helps, my current solution (too slow), is as follows.

def intramax(row):  
    trademax = ten['price'][np.logical_and(ten['timestamp'] > row['Entrytimestamp'], ten['timestamp'] < row['Exittimestamp'])].max()
    return trademax  

trade['IntraTradeMAX'] = trade.apply(intramax, axis=1) 

Solution

  • I'm sure you could do even better with a better algo and there may be some clever way in pandas I'm not thinking of, but a pretty naive numba implementation seems to do pretty well.

    In [110]: @numba.jit(nopython=True)
         ...: def intrastats(entry, exit_, ten_times, ten_prices):
         ...:     N = len(entry)
         ...:     K = len(ten_times)
         ...:     ans = np.zeros((N, 2))
         ...:     for i in range(N):
         ...:         ans[i, 0] = -np.inf
         ...:         ans[i, 1] = np.inf
         ...:         for j in range(K):
         ...:             if ten_times[j] < exit_[i] and ten_times[j] > entry[i]:
         ...:                 ans[i, 0] = max(ten_prices[j], ans[i, 0])
         ...:                 ans[i, 1] = min(ten_prices[j], ans[i, 1])
         ...:     return ans
    
    In [114]: %timeit intrastats(trade['Entrytimestamp'].values.view('int64'), 
                                 trade['Exittimestamp'].values.view('int64'), 
                                 ten['timestamp'].values.view('int64'),
                                 ten['price'].values)
    10000 loops, best of 3: 43.1 µs per loop
    
    In [115]: %timeit trade.apply(intramax, axis=1)
    100 loops, best of 3: 3.2 ms per loop