I need to reduce the performance of the following operation :
st = time.time()
bh = CustomBusinessHour(start='00:00', end='23:00')
bdates = pd.date_range(start='2024-01-01 00:00:00', end='2024-12-31 23:00:00', freq=bh, name='ts', closed=None)
print_statistics(f'{bdates}', 'filter_bp()', 'utils.py', time.time() - st)
which gives has the following output :
DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
'2024-01-01 02:00:00', '2024-01-01 03:00:00',
'2024-01-01 04:00:00', '2024-01-01 05:00:00',
'2024-01-01 06:00:00', '2024-01-01 07:00:00',
'2024-01-01 08:00:00', '2024-01-01 09:00:00',
...
'2024-12-31 13:00:00', '2024-12-31 14:00:00',
'2024-12-31 15:00:00', '2024-12-31 16:00:00',
'2024-12-31 17:00:00', '2024-12-31 18:00:00',
'2024-12-31 19:00:00', '2024-12-31 20:00:00',
'2024-12-31 21:00:00', '2024-12-31 22:00:00'],
dtype='datetime64[ns]', name='ts', length=6026, freq='CBH')
Execution time: 0.40148735046386718750 s
If I try just a similar operation without CustomBusinessHour I have a much better execution time :
st = time.time()
test1 = pd.date_range(start='2024-01-01 00:00:00', end='2024-12-31 23:00:00', freq='H', name='ts', closed=None)
print_statistics(f'{test1}', 'filter_bp()', 'utils.py', time.time() - st)
which gives has the following output :
DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
'2024-01-01 02:00:00', '2024-01-01 03:00:00',
'2024-01-01 04:00:00', '2024-01-01 05:00:00',
'2024-01-01 06:00:00', '2024-01-01 07:00:00',
'2024-01-01 08:00:00', '2024-01-01 09:00:00',
...
'2024-12-31 14:00:00', '2024-12-31 15:00:00',
'2024-12-31 16:00:00', '2024-12-31 17:00:00',
'2024-12-31 18:00:00', '2024-12-31 19:00:00',
'2024-12-31 20:00:00', '2024-12-31 21:00:00',
'2024-12-31 22:00:00', '2024-12-31 23:00:00'],
dtype='datetime64[ns]', name='ts', length=8784, freq='H')
Execution time: 0.00157237052917480469 s
So the execution time is 400 times faster, just have to remove those records.
Anyone has any idea how to do it ? Many thanks !
The solution is the following portion of code :
st = time.time()
test1 = pd.date_range(start='2024-01-01 00:00:00', end='2024-12-31 23:00:00', freq='H', name='ts', closed=None)
test2 = pd.DatetimeIndex(list(filter(lambda x: (np.is_busday(np.datetime64(x, 'D'))) and (x.hour != 23), test1)))
print_statistics(f'{test2}', 'filter_bp()', 'utils.py', time.time() - st)
which gives the following output :
DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
'2024-01-01 02:00:00', '2024-01-01 03:00:00',
'2024-01-01 04:00:00', '2024-01-01 05:00:00',
'2024-01-01 06:00:00', '2024-01-01 07:00:00',
'2024-01-01 08:00:00', '2024-01-01 09:00:00',
...
'2024-12-31 13:00:00', '2024-12-31 14:00:00',
'2024-12-31 15:00:00', '2024-12-31 16:00:00',
'2024-12-31 17:00:00', '2024-12-31 18:00:00',
'2024-12-31 19:00:00', '2024-12-31 20:00:00',
'2024-12-31 21:00:00', '2024-12-31 22:00:00'],
dtype='datetime64[ns]', length=6026, freq=None) Execution time: 0.13873147964477539062 s