I have a data frame, e, that has a Date column. In the original file, the dates are saved as factors. So I convert them to character strings. In those strings, they are in the format Day/Month/Year Hour:Minute, e.g. 27/05/2014 15:42. I then use strptime to convert these strings to POSIXct objects.
e$Date = as.character(e$Date)
e$Date = strptime(e$Date, tz = "UTC", format = "%d/%m/%Y %H:%M")
All looks good in the output because the dates now are in the form "2014-05-27 15:42:00 UTC". However. When I use which() to find the dates after "2016-02-14 00:51:00 UTC", I get the wrong dates!
which(e$Date>"2016-02-14 00:51:00 UTC")
The command returns a list of dates that start with "2016-02-12 08:54:00 UTC" even though there are several dozen dates after my start point, but before "2016-02-12 08:54:00 UTC" that it does not include. It seems to be shifted ahead by 8 or 9 hours.
I know I'm doing something wrong with the timezones. I saw the answer on Subset dataframe based on POSIXct date and time greater than datetime using dplyr which seemed really close to my problem. However, I did specify the timezone in my original strptime command (I've also tried using as.POSIXct instead of strptime and have the same problem).
Can anyone see what I am doing wrong???
You should probably keep it as a POSIXlt
datetime (which strptime
creates), and then compare against another datetime object in the appropriate timezone. E.g.:
x <- strptime(c("2018-08-30 08:04", "2018-08-30 08:05", "2018-08-30 08:06",
"2018-08-30 08:07", "2018-08-30 08:08"), format="%Y-%m-%d %H:%M", tz="UTC")
#[1] "2018-08-30 08:04:00 UTC" "2018-08-30 08:05:00 UTC"
#[3] "2018-08-30 08:06:00 UTC" "2018-08-30 08:07:00 UTC"
#[5] "2018-08-30 08:08:00 UTC"
Text comparison gives odd results:
x > "2018-08-30 08:06:00 UTC"
#TRUE TRUE TRUE TRUE TRUE
Datetime comparison:
x > as.POSIXlt("2018-08-30 08:06:00", tz="UTC")
#[1] FALSE FALSE FALSE TRUE TRUE