I am parsing csv files with a significant number of rows, and containing dates I would like to parse.
I read first the csv file, and then I am using pd.to_datetime()
to transform string into Timestamps.
Here is what looks like the strings, and the format I tried to use.
In [8]: ts_temp
Out[8]:
0 Sun Dec 22 2019 07:40:00 GMT+0100
1 Sun Dec 22 2019 07:45:00 GMT+0100
2 Sun Dec 22 2019 07:50:00 GMT+0100
date_format = "%a %b %d %Y %H:%M:%S %Z"
index = pd.to_datetime(ts_temp, utc = True, format=date_format)
Unfortunately, I then get this error message.
ValueError: unconverted data remains: 100
I can confirm using infer_datetime_format = True
instead works, with correct timezone reading, but it seems to me that it does take time.
I would have liked to see if I can improve running time by specifying directly the format.
Thanks for any help, bests!
Ok, I finally found out. The correct format is:
date_format = "%a %b %d %Y %H:%M:%S GMT%z"
And it seems to be about 40% faster using it than 'classical' infer.