I don't fully understand the behavior of converting data and time strings to POSIX
objects. For example, I have a vector of two strings representing date and time. Conversion w/o specifying the format ignore the time portion, and set the timezone to IST:
as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'))
[1] "2017-03-24 IST" "2017-03-24 IST"
But when I specify the format, it sets to a different timezone, and fails for the string where the hours are '2', but not if the time is a second latter.
as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), format="%Y-%m-%d %H:%M:%OS")
[1] NA "2017-03-24 03:00:00 IDT"
Three questions:
- Why the time zone differs between the two lines
As said in the comments, it differs due to daylight savings. Since you don't include the zone in the call to as.POSIXct
, you are prone to many problems. When at all possible, be explicit with timezone. This is a no-kidding moment: if you know it (and it is not part of the string), never assume it will be inferred correctly. In my experience, it will get it wrong enough to be really annoying and very difficult to detect, find, and fix.
- Why when no format is given it ignores the times' portion
It does not, though it might look like it. This is only a symptom of how it is printed, not stored. (This is common in many of R's functions, for instance how it shows pi
with only a handful of decimal places while it is certainly storing many more. Without this "representation versus actual precision" model, R's console would be unnecessarily full of decimal places and such, all the time.)
If I update your code to explicitly include zone:
as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel")
# [1] "2017-03-24 IST" "2017-03-24 IST"
as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel") + 1
# [1] "2017-03-24 00:00:01 IST" "2017-03-24 00:00:01 IST"
In the second case, I added one second to the times, and you see the time is now there. You can look at the internals to see it in a different way:
dput(as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel"))
# structure(c(1490306400, 1490306400), class = c("POSIXct", "POSIXt"
# ), tzone = "Israel")
dput(as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel")+1)
# structure(c(1490306401, 1490306401), tzone = "Israel", class = c("POSIXct",
# "POSIXt"))
Times are stored as floating point numbers and a special class. Between the two (without and with a 1-second addition), you can see that the numbers are just off-by-one.
A third way to confirm is to take the "missing time" posix objects and explicitly print to something (which is no longer POSIXct
, but it's just for demo):
a <- as.POSIXct(c('2017-03-24 02:59:59', '2017-03-24 03:00:00'), tz="Israel")
a
# [1] "2017-03-24 IST" "2017-03-24 IST"
format(a, format="the time is %Y-%m-%d %H:%M:%S")
# [1] "the time is 2017-03-24 00:00:00" "the time is 2017-03-24 00:00:00"
- Why does it fail to convert the first string when the format is specified?
As @Dave2e commented, according to the daylight savings conversions, that time "never happened".
According to https://www.timeanddate.com/time/change/israel/jerusalem?year=2017:
Mar 24, 2017 - Daylight Saving Time Started
When local standard time was about to reach Friday, March 24, 2017, 2:00:00 am clocks were turned forward 1 hour to Friday, March 24, 2017, 3:00:00 am local daylight time instead.
I interpret that to mean that the clock shifted from 01:59:59
to 03:00:00
, so 02:**:**
never happened. R is telling you with the NA
that that time should not have occurred. There are certainly ways (hacks) you can infer that this is the case: find all NA
values, then attempt to re-convert using plus or minus an hour; if the new value is not NA
, then you found another instance where R thinks that time is not possible. If it is still NA
, then there must be something else about the string (additional characters, different order, etc).
In my experience, I have not found this logic to ever be incorrect (though I don't know with certainty that it is flawless), even if it seems annoying. When I thought it might have been incorrect, I have always found something else that explained why I think I have that precise time: