I have the following problem: the date column in data I get contains dates that do not exist due to daylight saving time. (For example 2015-03-29 02:00 does not exist in Central European Time, because the clock gets set directly from 01:59 to 03:00 because DST takes effect on this day)
Is there an easy and reliable way to determine if a date is valid with respect to daylight saving time?
This is not trivial because of the properties of the datetime classes.
# generating the invalid time as POSIXlt object
test <- strptime("2015-03-29 02:00", format="%Y-%m-%d %H:%M", tz="CET")
# the object seems to represent something at least partially reasonable, notice the missing timezone specification though
test
# [1] "2015-03-29 02:00:00"
# strangely enough this object is regarded as NA by is.na
is.na(test)
# [1] TRUE
# which is no surprise if you consider:
is.na.POSIXlt
# function (x)
# is.na(as.POSIXct(x))
as.POSIXct(test)
# [1] NA
# inspecting the interior of my POSIXlt object:
unlist(test)
# sec min hour mday mon year wday yday isdst zone gmtoff
# "0" "0" "2" "29" "2" "115" "0" "87" "-1" "" NA
So the simplest way I thought of is to check the isdst
field of the POSIXlt
object, the help for POSIXt
describes the filed as follows:
isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.
Is checking the isdst
field save in the sense that this field is only -1
if the date is invalid due to dst-changes or can it be -1
for some other reasons?
Info on version, platform and locale
R.version
# _
# platform x86_64-w64-mingw32
# arch x86_64
# os mingw32
# system x86_64, mingw32
# status
# major 3
# minor 3.1
# year 2016
# month 06
# day 21
# svn rev 70800
# language R
# version.string R version 3.3.1 (2016-06-21)
# nickname Bug in Your Hair
Sys.getlocale()
# [1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
The value of as.POSIXct(test)
seems to be platform dependent, adding a layer of complexity to getting a reliable method. On my windows machine, (R 3.3.1), as.POSIXct(test)
produces NA
, as also reported by OP. However, on my Linux platform (same R version), I get the following:
times = c ("2015-03-29 01:00",
"2015-03-29 02:00",
"2015-03-29 03:00")
test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET")
test
#[1] "2015-03-29 01:00:00 CET" "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST"
as.POSIXct(test)
#[1] "2015-03-29 01:00:00 CET" "2015-03-29 01:00:00 CET" "2015-03-29 03:00:00 CEST"
as.character(test)
#[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00"
as.character(as.POSIXct(test))
#[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00"
The one thing that we can rely on is not the actual value of as.POSIXct(test)
, but that it will be different from test
when test
is an invalid date/time:
(as.character(test) == as.character(as.POSIXct(test))) %in% TRUE
# TRUE FALSE TRUE
I'm not sure that as.character
is strictly necessary here, but I include it just to ensure that we don't fall foul of any other odd behaviours of POSIX objects.