I am trying to build a strategy around intraday runners in the stock market, and I need to eliminate from my dataset any date from a symbol which occurs after the first day of a multi day runner.
However, only when that symbol runs multiple days. If it is only a day 1, then I need to keep it.
So let's assume a stock moves X% every day, for 3 days, I only want to keep the 1st day.
Once the run ends for that symbol I need to repeat the process (e.g keep only the first day from all other runs).
Say I already have a dataframe of tickers and dates which fit my gap criteria, like so:
symbol date
1 FOXO 2022-12-22 // day 1 - keep
2 FOXO 2022-12-23 // day 2 - remove
3 FOXO 2022-12-27 // day 3 - remove - we had trading breaks here for Christmas and weekends, therefore it's still considered day 3 //
4 FOXO 2022-12-29 // day 1 - keep
5 FOXO 2022-12-30 // day 2 - remove
6 FOXO 2023-01-03 // day 1 - keep
7 FOXO 2023-01-04 // day 2 - remove
8 FOXO 2023-01-05 // day 3 - remove
6 APPL 2023-01-03 // day 1 - keep
7 APPL 2023-01-04 // day 2 - remove
8 APPL 2023-01-05 // day 3 - remove
How can I achieve the desired result with pandas?
In Short
You can use numpy.busday_count(start, end, holidays=[])
to do this. This will ignore the weekends and holidays.
import numpy as np
np.busday_count(
'2022-12-23',
'2022-12-27',
holidays=['2022-12-24', '2022-12-25', '2022-12-26']
)
In case you want to include the weekends, you should be able to specify that with weekmask
(see documentation).
Full Example
# Prepare Packages
import pandas as pd
import numpy as np
# Prepare Dataset
df = pd.DataFrame({
'symbol': ['FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'FOXO', 'APPL', 'APPL', 'APPL'],
'date': ['2022-12-22', '2022-12-23', '2022-12-27', '2022-12-29', '2022-12-30', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-02', '2023-01-03', '2023-01-04']
})
# List Holidays
holidays = ['2022-12-24', '2022-12-25', '2022-12-26']
# Get the consecutive date
df['prev_date'] = df.groupby('symbol')['date'].shift(1)
# Get trading days difference (ignores weekends and holidays)
df['trading_days_diff'] = np.busday_count(
df['prev_date'].fillna(df['date']).to_list(),
df['date'].to_list(),
holidays=holidays
)
# Get rows to keep
df[ df['trading_days_diff']!=1 ]
To see the days
you could add the code below
df['run_id'] = (df['trading_days_diff']!=1).cumsum()
df['days'] = df.groupby('run_id').cumcount()+1
To get only runs with multiple days you could do the following:
df[
(df['run_id'].isin(df[ df['days']>1 ]['run_id']))
& (df['days'] == 1)
]