Search code examples
rdatetimedst

Is there a reliable way to detect POSIXlt objects representing a time which does not exist due to DST?


I have the following problem: the date column in data I get contains dates that do not exist due to daylight saving time. (For example 2015-03-29 02:00 does not exist in Central European Time, because the clock gets set directly from 01:59 to 03:00 because DST takes effect on this day)

Is there an easy and reliable way to determine if a date is valid with respect to daylight saving time?

This is not trivial because of the properties of the datetime classes.

# generating the invalid time as POSIXlt object
test <- strptime("2015-03-29 02:00", format="%Y-%m-%d %H:%M", tz="CET")

# the object seems to represent something at least partially reasonable, notice the missing timezone specification though
test
# [1] "2015-03-29 02:00:00"

# strangely enough this object is regarded as NA by is.na
is.na(test)
# [1] TRUE

# which is no surprise if you consider:
is.na.POSIXlt
# function (x) 
# is.na(as.POSIXct(x))

as.POSIXct(test)
# [1] NA

# inspecting the interior of my POSIXlt object:
unlist(test)
# sec    min   hour   mday    mon   year   wday   yday  isdst   zone gmtoff
# "0"    "0"    "2"   "29"    "2"  "115"    "0"   "87"   "-1"     ""     NA

So the simplest way I thought of is to check the isdst field of the POSIXlt object, the help for POSIXt describes the filed as follows:

isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.

Is checking the isdst field save in the sense that this field is only -1 if the date is invalid due to dst-changes or can it be -1 for some other reasons?

Info on version, platform and locale

R.version
# _                           
# platform       x86_64-w64-mingw32          
# arch           x86_64                      
# os             mingw32                     
# system         x86_64, mingw32             
# status                                     
# major          3                           
# minor          3.1                         
# year           2016                        
# month          06                          
# day            21                          
# svn rev        70800                       
# language       R                           
# version.string R version 3.3.1 (2016-06-21)
# nickname       Bug in Your Hair            
Sys.getlocale()
# [1] "LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Solution

  • The value of as.POSIXct(test) seems to be platform dependent, adding a layer of complexity to getting a reliable method. On my windows machine, (R 3.3.1), as.POSIXct(test) produces NA, as also reported by OP. However, on my Linux platform (same R version), I get the following:

    times = c ("2015-03-29 01:00",
               "2015-03-29 02:00",
               "2015-03-29 03:00")
    
    test <- strptime(times, format="%Y-%m-%d %H:%M", tz="CET")
    
    test
    #[1] "2015-03-29 01:00:00 CET"  "2015-03-29 02:00:00 CEST" "2015-03-29 03:00:00 CEST"
    as.POSIXct(test)
    #[1] "2015-03-29 01:00:00 CET"  "2015-03-29 01:00:00 CET"  "2015-03-29 03:00:00 CEST"
    as.character(test)
    #[1] "2015-03-29 01:00:00" "2015-03-29 02:00:00" "2015-03-29 03:00:00"
    as.character(as.POSIXct(test))
    #[1] "2015-03-29 01:00:00" "2015-03-29 01:00:00" "2015-03-29 03:00:00"
    

    The one thing that we can rely on is not the actual value of as.POSIXct(test), but that it will be different from test when test is an invalid date/time:

    (as.character(test) == as.character(as.POSIXct(test))) %in% TRUE
    # TRUE FALSE  TRUE
    

    I'm not sure that as.character is strictly necessary here, but I include it just to ensure that we don't fall foul of any other odd behaviours of POSIX objects.