I have a DateTimeIndex, I need to convert to a certain column of the Dataframe, and use a specific format, my code is as follows, how to optimize?
import numpy as np
import pandas as pd
original = pd.date_range(start='20210520 09:00:00', end='20210520 12:00:00', freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
result = pd.DataFrame(time, columns=['time'])
print('original:')
print(original)
print('result:')
print(result)
original:
DatetimeIndex(['2021-05-20 09:00:00', '2021-05-20 09:30:00',
'2021-05-20 10:00:00', '2021-05-20 10:30:00',
'2021-05-20 11:00:00', '2021-05-20 11:30:00',
'2021-05-20 12:00:00'],
dtype='datetime64[ns]', freq='30T')
result:
time
0 09:00:00
1 09:30:00
2 10:00:00
3 10:30:00
4 11:00:00
5 11:30:00
6 12:00:00
Instead of this:
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
Use:
time=original.time.astype(str)
Performance:
%%timeit
original = pd.date_range(start='20210520 09:00:00', end='20210520 12:00:00', freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
result = pd.DataFrame(time, columns=['time'])
>>>925 µs ± 53.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
original = pd.date_range(start='20210520 09:00:00', end='20210520 12:00:00', freq='30min')
time=original.time.astype(str)
result = pd.DataFrame(time, columns=['time'])
>>>724 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)