Search code examples
pythonpandasnumpystocktrading

Keep only day 1 from multi day runners - stock market, Python


I am trying to build a strategy around intraday runners in the stock market, and I need to eliminate from my dataset any date from a symbol which occurs after the first day of a multi day runner.

However, only when that symbol runs multiple days. If it is only a day 1, then I need to keep it.

So let's assume a stock moves X% every day, for 3 days, I only want to keep the 1st day.

Once the run ends for that symbol I need to repeat the process (e.g keep only the first day from all other runs).

Say I already have a dataframe of tickers and dates which fit my gap criteria, like so:

   symbol        date
1    FOXO  2022-12-22 // day 1 - keep
2    FOXO  2022-12-23 // day 2 - remove
3    FOXO  2022-12-27 // day 3 - remove - we had trading breaks here for Christmas and weekends, therefore it's still considered day 3 //
4    FOXO  2022-12-29 // day 1 - keep
5    FOXO  2022-12-30 // day 2 - remove
6    FOXO  2023-01-03 // day 1 - keep
7    FOXO  2023-01-04 // day 2 - remove
8    FOXO  2023-01-05 // day 3 - remove
6    APPL  2023-01-03 // day 1 - keep
7    APPL  2023-01-04 // day 2 - remove
8    APPL  2023-01-05 // day 3 - remove

How can I achieve the desired result with pandas?


Solution

  • In Short

    You can use numpy.busday_count(start, end, holidays=[]) to do this. This will ignore the weekends and holidays.

    import numpy as np
    np.busday_count(
        '2022-12-23',
        '2022-12-27',
        holidays=['2022-12-24', '2022-12-25', '2022-12-26']
    )
    

    In case you want to include the weekends, you should be able to specify that with weekmask (see documentation).

    Full Example

    # Prepare Packages
    import pandas as pd
    import numpy as np
    
    # Prepare Dataset
    df = pd.DataFrame({
        'symbol': ['FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'APPL', 'APPL', 'APPL'],
        'date': ['2022-12-22', '2022-12-23', '2022-12-27', '2022-12-29', '2022-12-30', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-02', '2023-01-03', '2023-01-04']
    })
    
    # List Holidays
    holidays = ['2022-12-24', '2022-12-25', '2022-12-26']
    
    # Get the consecutive date
    df['prev_date'] = df.groupby('symbol')['date'].shift(1)
    
    # Get trading days difference (ignores weekends and holidays)
    df['trading_days_diff'] = np.busday_count(
        df['prev_date'].fillna(df['date']).to_list(),
        df['date'].to_list(),
        holidays=holidays
    )
    
    # Get rows to keep
    df[ df['trading_days_diff']!=1 ]
    

    To see the days you could add the code below

    df['run_id'] = (df['trading_days_diff']!=1).cumsum()
    df['days'] = df.groupby('run_id').cumcount()+1
    

    To get only runs with multiple days you could do the following:

    df[
        (df['run_id'].isin(df[ df['days']>1 ]['run_id']))
        & (df['days'] == 1)
    ]