Search code examples
rdatetimenastrptime

R treats my variable as NA even though they are converted successfully by strptime or as.POSIXct


d$Accessed.Time<-strptime(d$accessed_at,format="%Y-%m-%d %H:%M:%S")
d$Counselor.Added.Time<-strptime(d$counselor_added_at,format="%Y-%m-%d %H:%M:%S")
d$logtime<-as.numeric(d$Accessed.Time-d$Counselor.Added.Time,units="days")
View(d[which(is.na(d$logtime)),
c("accessed_at","Accessed.Time","counselor_added_at","Counselor.Added.Time","logtime")])

First I converted d$accessed_at and d$counselor_added_at to R Datetime variable and performed an arithmetic operation on it and stored it in d$logtime. The weirdest thing is that R will treat certain d$Counselor.Added.Time as NA even though they are converted successfully.

R treats my datetime variable as NA even though it's converted successfully

The above screenshot is of that last View statement in R

is.na() will return TRUE for Counselor.Added.Time for all these observations and then having arithmetic operation fail on them even though they appear to be converted successfully.

Does anyone know what's going on?

it appears that this error is specific to these specific times

I tried this: a<-strptime("2015-03-08 02:33:07",format="%Y-%m-%d %H:%M:%S") and is.na(a) returned TRUE


Solution

  • You can get confusing behaviour with the change to and from daylight saving time.
    For example, in Melbourne, Australia, the time 2:30 am on 7 Oct, 2012 doesn’t exist because clocks where moved forward one hour from 2 am to 3 am. R will return NA if we attempt to use that time.

    ISOdatetime(2012,10,7,2,30,0, tz='Australia/Melbourne')
    
    [1] NA
    

    The behaviour of strptime is interesting, the conversion is done, the value looks ok but it's actually missing.

    x <- strptime('2012-10-7 2:30:0',format="%Y-%m-%d %H:%M:%S", tz='Australia/Melbourne')
    x
    #[1] "2012-10-07 02:30:00"
    is.na(x)
    #[1] TRUE
    as.numeric(x)
    # NA
    

    Lets try a time that does exist

    x <- strptime('2012-10-7 3:30:0',format="%Y-%m-%d %H:%M:%S", tz='Australia/Melbourne')
    x
    #[1] "2012-10-07 03:30:00 AEDT"
    is.na(x)
    # [1] FALSE
    

    These problems go away if you specify the timezone as UTC

    x <- strptime('2012-10-7 2:30:0',format="%Y-%m-%d %H:%M:%S", tz='UTC')
    #[1] "2012-10-07 02:30:00 UTC"
    

    The other tricky thing is that the time zone can be taken, by default, from whatever time zone your computer is using. Code might work or not work depending on whether you run it during summer (daylight saving time) or winter. Safest to always specify the time zone rather than rely on a default value.