I have a file containing date and time information as a character string. I want to convert it to a datetime POSIXlt object and to that end I use the strftime() base function in R.
While the command works fine when I am passing it a character string, or when I apply it a small slice of the dataframe, when I apply it in a larger slice or in the entire datadrame it fails to capture the Hour. Please see below:
d = "2017-11-18 01:00:00"
t = strftime(d, format = "%Y-%m-%d %H:%M")
t
'2017-11-18 01:00'
head(data %>% dplyr::slice(1:1000) %>% mutate(DateTime1 = strftime(DateTime, format = "%Y-%m-%d %H:%M")))
DateTime Junction Vehicles ID DateTime1
2015-11-01 00:00:00 1 15 20151101001 2015-11-01 00:00
2015-11-01 01:00:00 1 13 20151101011 2015-11-01 01:00
2015-11-01 02:00:00 1 10 20151101021 2015-11-01 02:00
2015-11-01 03:00:00 1 7 20151101031 2015-11-01 03:00
2015-11-01 04:00:00 1 9 20151101041 2015-11-01 04:00
2015-11-01 05:00:00 1 6 20151101051 2015-11-01 05:00
head(data %>% mutate(DateTime1 = strftime(DateTime, format = "%Y-%m-%d %H:%M")))
DateTime Junction Vehicles ID DateTime1
2015-11-01 00:00:00 1 15 20151101001 2015-11-01 00:00
2015-11-01 01:00:00 1 13 20151101011 2015-11-01 00:00
2015-11-01 02:00:00 1 10 20151101021 2015-11-01 00:00
2015-11-01 03:00:00 1 7 20151101031 2015-11-01 00:00
2015-11-01 04:00:00 1 9 20151101041 2015-11-01 00:00
2015-11-01 05:00:00 1 6 20151101051 2015-11-01 00:00
How this erratic behavior is to be explained and how can I convert the datetime column for the entire dataset?
Your advice will be appreciated.
I can't tell exactly, but there might be some inconsistency in your DateTime variable. Since it seems to be a character variable, when the vectorized version of your conversion finds an unexpected behaviour it won't handle it well because the same conversion is applied to every entry (it'll use a format that works in all cases), that is probably the reason why it worked in a slice. I'd suggest either gradually increasing your slice size in order to track where the inconsistency appears (there might be a smarter way to check it).
Another note is that strftime
requires its first argument to be convertible to POSIXlt, so I'd recommend passing a POSIXlt object to avoid your function choosing how to convert your original string. You could use for example:
t = strftime(strptime(d, format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M")
Then, hopefully, when converting your character variable to a POSIXlt the way you expect it to be, via strptime
, it will throw an error due to inconsistent rows.
A third, but not as good solution would be to use rowwise()
, what would convert each row separately, producing expected results except for the inconsistencies (It'd look for appropriate format in each row, instead of one format that works for everyone).