I was trying to convert a date-time characters column in a huge dataframe to POSIXct for further processing. An example of the date-time column is: "Wed Jul 01 00:10:32 UTC 2020".
Due to the large amount of rows (>5 million rows), I was trying the fastPOSIXct
function in the fasttime
package to speed it up. My plan was to convert the date-time character to a POSIXlt format and then use the fastPOSIXct
function to process.
In the process, I found that the base::strptime
and lubridate::fast_strptime
returned different outcomes:
> base::strptime("Wed Jul 01 00:10:32 UTC 2020", format = '%a %b %d %H:%M:%S UTC %Y', tz = "UTC")
[1] "2020-07-01 00:10:32 UTC"
and
> lubridate::fast_strptime("Wed Jul 01 00:10:32 UTC 2020", format = '%a %b %d %H:%M:%S UTC %Y', tz = "UTC")
[1] NA
Why fast_strptime
returned NAs? Is there any faster way speed up this format conversion process?
Thanks!
It is because fast_strptime function does not accept "a" as appliable format. If you remove the weekday name argument, it will return the same results.
fast_strptime("Jul 01 00:10:32 UTC 2020", format = '%b %d %H:%M:%S UTC %Y', tz = "UTC")