Search code examples
pythonpandasstrftime

Specifying a strftime format to speed up pandas' to_datetime() method


Consider the following code:

import pandas as pd
some_time='01/01/2011 12:02:41 AM'
print(pd.to_datetime(some_time))
print(pd.to_datetime(some_time, format='%m/%d/%Y %I:%M:%S %r'))

The first to_datetime() conversion works and prints the output

2011-01-01 00:02:41

Unfortunately, in my real application I'm dealing with a DataFrame with over 2 million rows and the default to_datetime() is prohibitively slow, even if I set infer_datetime_format=True in the keyword arguments.

I've read that to_datetime() can be sped up by specifying the string format explicitly. I've tried this following http://www.tutorialspoint.com/python/time_strftime.htm, but my attempt above fails with the error ValueError: 'r' is a bad directive in format '%m/%d/%Y %I:%M:%S %r'.

How can I specify the correct strftime format to convert '01/01/2011 12:02:41 AM' to datetime?


Solution

  • I think you just need %p instead of %r. The difference is %r expects punctuation (A.M. or P.M.), whereas %p does not (AM or PM).

    Your code does not produce any errors when I make the change:

    pd.to_datetime(some_time, format='%m/%d/%Y %I:%M:%S %p')