d$Accessed.Time<-strptime(d$accessed_at,format="%Y-%m-%d %H:%M:%S")
d$Counselor.Added.Time<-strptime(d$counselor_added_at,format="%Y-%m-%d %H:%M:%S")
d$logtime<-as.numeric(d$Accessed.Time-d$Counselor.Added.Time,units="days")
View(d[which(is.na(d$logtime)),
c("accessed_at","Accessed.Time","counselor_added_at","Counselor.Added.Time","logtime")])
First I converted d$accessed_at
and d$counselor_added_at
to R Datetime variable and performed an arithmetic operation on it and stored it in d$logtime
. The weirdest thing is that R will treat certain d$Counselor.Added.Time
as NA
even though they are converted successfully.
The above screenshot is of that last View
statement in R
is.na()
will return TRUE
for Counselor.Added.Time
for all these observations and then having arithmetic operation fail on them even though they appear to be converted successfully.
Does anyone know what's going on?
it appears that this error is specific to these specific times
I tried this:
a<-strptime("2015-03-08 02:33:07",format="%Y-%m-%d %H:%M:%S")
and
is.na(a)
returned TRUE
You can get confusing behaviour with the change to and from daylight saving time.
For example, in Melbourne, Australia, the time 2:30 am on 7 Oct, 2012 doesn’t exist because clocks where moved forward one hour from 2 am to 3 am. R will return NA if we attempt to use that time.
ISOdatetime(2012,10,7,2,30,0, tz='Australia/Melbourne')
[1] NA
The behaviour of strptime is interesting, the conversion is done, the value looks ok but it's actually missing.
x <- strptime('2012-10-7 2:30:0',format="%Y-%m-%d %H:%M:%S", tz='Australia/Melbourne')
x
#[1] "2012-10-07 02:30:00"
is.na(x)
#[1] TRUE
as.numeric(x)
# NA
Lets try a time that does exist
x <- strptime('2012-10-7 3:30:0',format="%Y-%m-%d %H:%M:%S", tz='Australia/Melbourne')
x
#[1] "2012-10-07 03:30:00 AEDT"
is.na(x)
# [1] FALSE
These problems go away if you specify the timezone as UTC
x <- strptime('2012-10-7 2:30:0',format="%Y-%m-%d %H:%M:%S", tz='UTC')
#[1] "2012-10-07 02:30:00 UTC"
The other tricky thing is that the time zone can be taken, by default, from whatever time zone your computer is using. Code might work or not work depending on whether you run it during summer (daylight saving time) or winter. Safest to always specify the time zone rather than rely on a default value.