Search code examples
pythonpandasstring-to-datetime

Python/Panda - How reading timezone information in to_datetime()?


I am parsing csv files with a significant number of rows, and containing dates I would like to parse.

I read first the csv file, and then I am using pd.to_datetime() to transform string into Timestamps.

Here is what looks like the strings, and the format I tried to use.

In [8]: ts_temp
Out[8]: 
0     Sun Dec 22 2019 07:40:00 GMT+0100
1     Sun Dec 22 2019 07:45:00 GMT+0100
2     Sun Dec 22 2019 07:50:00 GMT+0100

date_format = "%a %b %d %Y %H:%M:%S %Z"
index = pd.to_datetime(ts_temp, utc = True, format=date_format)

Unfortunately, I then get this error message.

ValueError: unconverted data remains: 100

I can confirm using infer_datetime_format = True instead works, with correct timezone reading, but it seems to me that it does take time.

I would have liked to see if I can improve running time by specifying directly the format.

Thanks for any help, bests!


Solution

  • Ok, I finally found out. The correct format is: date_format = "%a %b %d %Y %H:%M:%S GMT%z"

    And it seems to be about 40% faster using it than 'classical' infer.