Search code examples
pythonpython-3.xpandasquantitative-finance

how to Get stats from timeseries dataframes and impute it in 3rd dataframe based on certain rules?


i have two dataframes

df1=pd.DataFrame({"a":pd.date_range("2021-01-01","2021-01-10"),
         "b" :[12,13,16,15,12,14,17,19,20,21]})

df2=pd.DataFrame({"a":pd.date_range("2021-01-01","2021-01-10"),
               "b":[np.nan,np.nan,13.7,14.3,14,15,14.3,16.7,18.8,20]})

whenevr df1[b] > df2[b] i want to buy and whenevr df1[b] < df2[b] exit.

Note:

  1. i wont take buy if previous buy did not exit
  2. if buy happens and exit condition did not so we will consider last date as exit

Required Output

#dataframe 
stat=pd.DataFrame({"stock":["b","b"],
          "Entry_date":["2021-01-03","2021-01-07"],
         "Entry_price":[16,17],
         "Exit_date":["2021-01-05","2021-01-10"],
         "Exit_price":[12,21]})

Solution

  • Update (multiple stocks)

    Given multiple stocks b and c:

    df1 = pd.DataFrame({'a':{0:pd.Timestamp('2021-01-01 00:00:00'),1:pd.Timestamp('2021-01-02 00:00:00'),2:pd.Timestamp('2021-01-03 00:00:00'),3:pd.Timestamp('2021-01-04 00:00:00'),4:pd.Timestamp('2021-01-05 00:00:00'),5:pd.Timestamp('2021-01-06 00:00:00'),6:pd.Timestamp('2021-01-07 00:00:00'),7:pd.Timestamp('2021-01-08 00:00:00'),8:pd.Timestamp('2021-01-09 00:00:00'),9:pd.Timestamp('2021-01-10 00:00:00')},'b':{0:12,1:13,2:16,3:15,4:12,5:14,6:17,7:19,8:20,9:21},'c':{0:13,1:16,2:15,3:12,4:14,5:17,6:19,7:20,8:21,9:12}})
    df2 = pd.DataFrame({'a':{0:pd.Timestamp('2021-01-01 00:00:00'),1:pd.Timestamp('2021-01-02 00:00:00'),2:pd.Timestamp('2021-01-03 00:00:00'),3:pd.Timestamp('2021-01-04 00:00:00'),4:pd.Timestamp('2021-01-05 00:00:00'),5:pd.Timestamp('2021-01-06 00:00:00'),6:pd.Timestamp('2021-01-07 00:00:00'),7:pd.Timestamp('2021-01-08 00:00:00'),8:pd.Timestamp('2021-01-09 00:00:00'),9:pd.Timestamp('2021-01-10 00:00:00')},'b':{0:np.nan,1:np.nan,2:13.7,3:14.3,4:14.0,5:15.0,6:14.3,7:16.7,8:18.8,9:20.0},'c':{0:np.nan,1:13.7,2:14.3,3:14.0,4:15.0,5:14.3,6:16.7,7:30.0,8:30.0,9:30.0}})
    
    # -------- df1 --------      ---------- df2 ----------
    #             a   b   c                  a     b     c
    # 0  2021-01-01  12  13      0  2021-01-01   NaN   NaN
    # 1  2021-01-02  13  16      1  2021-01-02   NaN  13.7
    # 2  2021-01-03  16  15      2  2021-01-03  13.7  14.3
    # 3  2021-01-04  15  12      3  2021-01-04  14.3  14.0
    # 4  2021-01-05  12  14      4  2021-01-05  14.0  15.0
    # 5  2021-01-06  14  17      5  2021-01-06  15.0  14.3
    # 6  2021-01-07  17  19      6  2021-01-07  14.3  16.7
    # 7  2021-01-08  19  20      7  2021-01-08  16.7  30.0
    # 8  2021-01-09  20  21      8  2021-01-09  18.8  30.0
    # 9  2021-01-10  21  12      9  2021-01-10  20.0  30.0
    

    First set the date as index and create the initial df1 > df2 mask:

    df1 = df1.set_index('a')
    df2 = df2.set_index('a')
    
    mask = df1 > df2
    

    To handle missing exit conditions at the end (as in stock b), append an extra line to df1 and mask:

    # duplicate the mask's last row but with False values
    # - if the last row originally was True, this closes the exit condition
    # - if the last row originally was False, this has no effect
    mask = mask.append(mask.tail(1))
    mask.iloc[-1] = False
    
    # duplicate df1's last row to match the mask
    df1 = df1.append(df1.tail(1))
    

    Index the entry points with mask & ~mask.shift() and exit points with ~mask & mask.shift():

    entry = (df1[mask & ~mask.shift().bfill()]
             .stack().reset_index(name='entry_price')
             .rename(columns={'a': 'entry_date', 'level_1': 'stock'}))
    
    exit = (df1[~mask & mask.shift().bfill()]
            .stack().reset_index(name='exit_price')
            .rename(columns={'a': 'exit_date', 'level_1': 'stock'}))
    

    Then concat() the frames:

    pd.concat([
        entry[['stock', 'entry_date', 'entry_price']],
        exit[['exit_date', 'exit_price']],
    ], axis=1).sort_values(by='stock')
    
    #   stock entry_date  entry_price   exit_date  exit_price
    # 1     b 2021-01-03         16.0  2021-01-05        12.0
    # 3     b 2021-01-07         17.0  2021-01-10        21.0
    # 0     c 2021-01-02         16.0  2021-01-04        12.0
    # 2     c 2021-01-06         17.0  2021-01-08        20.0
    

    Original answer (one stock)

    Create a boolean mask and index the entry points with mask & ~mask.shift():

    mask = df1.b > df2.b
    entry = df1[mask & ~mask.shift().bfill()].rename(columns={'a': 'entry_date', 'b': 'entry_price'})
    
    #   entry_date  entry_price
    # 0 2021-01-03           16
    # 1 2021-01-07           17
    

    And index the exit points with ~mask & mask.shift():

    exit = df1[~mask & mask.shift().bfill()]
    exit = pd.concat([exit, df1.tail(1)]).rename(columns={'a': 'exit_date', 'b': 'exit_price'})
    
    #    exit_date  exit_price
    # 0 2021-01-05          12
    # 1 2021-01-10          21
    

    Then concat() the frames:

    stat = pd.concat([
        entry.reset_index(drop=True),
        exit.reset_index(drop=True),
    ], axis=1).assign(stock='b').dropna()
    
    #   entry_date  entry_price  exit_date  exit_price stock
    # 0 2021-01-03           16 2021-01-05          12     b
    # 1 2021-01-07           17 2021-01-10          21     b