Trading data is recorded on non-holiday business days starting at 9:30AM, and data is recorded periodically (5 seconds, 5 minutes, 30 minutes, etc) until 4:00PM. For instance, [9:30AM, 9:35AM,...,3:50PM, 3:55PM] on non-holiday business days are times for which there are 5 minute data.
Does pandas
have the capabilities required to correctly work with this sort of time structure? I have seen that there are ways of adding custom holidays to business days (9:00AM-4:00PM). But I have yet to see anyway of defining custom business hours, like (9:30AM-4:00PM). For instance, is it possible to define a custom trading hour [9:30AM, 10:30AM, ... 3:30PM], or trading 5 minutes that takes advantage of whatever code powers the business hour?
For instance, is there a pandas
-powered way to generate a DatetimeIndex
of, lets say, 5 minute frequency that follows the rules of trading times:
Or is this just not supported at this time?
In other words, what methods/classes does pandas
currently have to implement this kind of functionality? If there are none, does anyone know of other libraries that could help?
I had this same question, at least the part about
is there a pandas-powered way to generate a DatetimeIndex of, lets say, 5 minute frequency that follows the rules of trading times?
And I have an answer for you below, but first, regarding:
Does pandas have the capabilities required to correctly work with this sort of time structure?
The phrase "work with" is rather vague; What "work" is it that you want to do?
Pandas can definitely work correctly with this sort of time structure to do some things but perhaps not others.
and regarding.
For instance, is it possible to define a custom trading hour [9:30AM, 10:30AM, ... 3:30PM] ... ?
The answer is yes, however there are some limitations which I will describe below. That said, I will focus primarily on the ability to generate a datetime index that follows the rules of trading times and trading days, and demonstrate below how to do this using pandas. First I will mention some of the existing pandas tools and their limitations.
I spent the better part of yesterday evening researching this. Most all of the functions that pandas has to generate a datetime index range accept an input frequency freq
as a string (such as '15T'
for 15 minutes, or 'D'
for daily).
These same functions will also typically accept a pandas time series offset object in place of the frequency. There are many different kinds of offsets: Each defines a frequency that can have rules, such as skipping over weekends (for example, offsets.BusinessDay) or generating a datetime index for business hours only such as only from 9am till 5pm.
The main problem that I have encountered with frequencies, and with offset objects, is that (for the most part) there is no way to combine them.
For example, using the BusinessHour offset class I can specify that I want to generate an index containing only business hours (trading hours), and I can even do so only for business days (over a bunch of dates) but I cannot combine this with specifying a frequency such as once per minute, or once every 15 minutes. Rather, the BusinessHour offset class appears to default to a frequence of once per hour only, and there is no way I have found to change it.
Alternatively, for example, I can use the pandas.bdate_range()
specifying a frequency of '30T'
to generate an index point every 30 minutes, but non-business days will be included. If I set the frequency to 'B'
for business days only, then it skips non-business days, but I only get one index point per day. No way, that I found, to combine these two frequencies.
The simplest solution that I came up with is to generate a list of business days (trading days) and then loop through that list generating an index for each date at the desired frequency (1 minute, or 5 minutes, or 15 minutes) from open time till close time on each date. I then use pandas DatetimeIndex.union_many()
method to combine the datetime indexes into one. The code looks like this:
def trading_day_range(bday_start=None,bday_end=None,bday_freq='B',
open_time='09:30',close_time='16:00',iday_freq='15T',weekmask=None):
if bday_start is None: bday_start = pd.Timestamp.today()
if bday_end is None: bday_end = bday_start + pd.Timedelta(days=1)
daily = []
for d in pd.bdate_range(start=bday_start,end=bday_end,freq=bday_freq,weekmask=weekmask):
topen = pd.Timestamp(open_time)
d1 = d.replace(hour=topen.hour,minute=topen.minute)
tclose = pd.Timestamp(close_time)
d2 = d.replace(hour=tclose.hour,minute=tclose.minute+1)
daily.append(pd.date_range(d1,d2,freq=iday_freq))
index = daily[0].union_many(daily[1:])
return index
Here are some examples of using it:
ix = trading_day_range()
print('len(ix)=',len(ix))
print(ix[20:40])
print(ix[-20:])
len(ix)= 54
DatetimeIndex(['2021-04-13 14:30:00', '2021-04-13 14:45:00',
'2021-04-13 15:00:00', '2021-04-13 15:15:00',
'2021-04-13 15:30:00', '2021-04-13 15:45:00',
'2021-04-13 16:00:00', '2021-04-14 09:30:00',
'2021-04-14 09:45:00', '2021-04-14 10:00:00',
'2021-04-14 10:15:00', '2021-04-14 10:30:00',
'2021-04-14 10:45:00', '2021-04-14 11:00:00',
'2021-04-14 11:15:00', '2021-04-14 11:30:00',
'2021-04-14 11:45:00', '2021-04-14 12:00:00',
'2021-04-14 12:15:00', '2021-04-14 12:30:00'],
dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2021-04-14 11:15:00', '2021-04-14 11:30:00',
'2021-04-14 11:45:00', '2021-04-14 12:00:00',
'2021-04-14 12:15:00', '2021-04-14 12:30:00',
'2021-04-14 12:45:00', '2021-04-14 13:00:00',
'2021-04-14 13:15:00', '2021-04-14 13:30:00',
'2021-04-14 13:45:00', '2021-04-14 14:00:00',
'2021-04-14 14:15:00', '2021-04-14 14:30:00',
'2021-04-14 14:45:00', '2021-04-14 15:00:00',
'2021-04-14 15:15:00', '2021-04-14 15:30:00',
'2021-04-14 15:45:00', '2021-04-14 16:00:00'],
dtype='datetime64[ns]', freq=None)
ix1 = trading_day_range('01/01/2021 09:30','01/13/2021 16:00',
bday_freq='C',iday_freq='30T',weekmask='Wed Thu Fri')
print('len(ix1)=',len(ix1))
print(ix1[20:40])
print(ix1[-20:])
len(ix1)= 70
DatetimeIndex(['2021-01-06 12:30:00', '2021-01-06 13:00:00',
'2021-01-06 13:30:00', '2021-01-06 14:00:00',
'2021-01-06 14:30:00', '2021-01-06 15:00:00',
'2021-01-06 15:30:00', '2021-01-06 16:00:00',
'2021-01-07 09:30:00', '2021-01-07 10:00:00',
'2021-01-07 10:30:00', '2021-01-07 11:00:00',
'2021-01-07 11:30:00', '2021-01-07 12:00:00',
'2021-01-07 12:30:00', '2021-01-07 13:00:00',
'2021-01-07 13:30:00', '2021-01-07 14:00:00',
'2021-01-07 14:30:00', '2021-01-07 15:00:00'],
dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2021-01-08 13:30:00', '2021-01-08 14:00:00',
'2021-01-08 14:30:00', '2021-01-08 15:00:00',
'2021-01-08 15:30:00', '2021-01-08 16:00:00',
'2021-01-13 09:30:00', '2021-01-13 10:00:00',
'2021-01-13 10:30:00', '2021-01-13 11:00:00',
'2021-01-13 11:30:00', '2021-01-13 12:00:00',
'2021-01-13 12:30:00', '2021-01-13 13:00:00',
'2021-01-13 13:30:00', '2021-01-13 14:00:00',
'2021-01-13 14:30:00', '2021-01-13 15:00:00',
'2021-01-13 15:30:00', '2021-01-13 16:00:00'],
dtype='datetime64[ns]', freq=None)