Search code examples
pythonpandasdataframedatetimeindex

How to optimize the conversion of DateTimeIndex to a certain column of a DataFrame in a specific format?


I have a DateTimeIndex, I need to convert to a certain column of the Dataframe, and use a specific format, my code is as follows, how to optimize?

import numpy as np
import pandas as pd

original = pd.date_range(start='20210520 09:00:00', end='20210520 12:00:00', freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
result = pd.DataFrame(time, columns=['time'])
print('original:')
print(original)
print('result:')
print(result)
original:
DatetimeIndex(['2021-05-20 09:00:00', '2021-05-20 09:30:00',
               '2021-05-20 10:00:00', '2021-05-20 10:30:00',
               '2021-05-20 11:00:00', '2021-05-20 11:30:00',
               '2021-05-20 12:00:00'],
              dtype='datetime64[ns]', freq='30T')
result:
       time
0  09:00:00
1  09:30:00
2  10:00:00
3  10:30:00
4  11:00:00
5  11:30:00
6  12:00:00

Solution

  • Instead of this:

    time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
    

    Use:

    time=original.time.astype(str)
    

    Performance:

    ​%%timeit
    original = pd.date_range(start='20210520 09:00:00', end='20210520 12:00:00', freq='30min')
    time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
    result = pd.DataFrame(time, columns=['time'])
    
    >>>925 µs ± 53.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %%timeit
    original = pd.date_range(start='20210520 09:00:00', end='20210520 12:00:00', freq='30min')
    time=original.time.astype(str)
    result = pd.DataFrame(time, columns=['time'])
          
    >>>724 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    enter image description here