Search code examples
rdatetimeroundingnaposixlt

Why are some datetime values being deleted when I round the time in R?


I am importing a series of csv files into R. These contain a date/time column, id, and two columns of temperature values.

This would give an example of what the data looks like:

id<-c(1,2,3,4)
date.time<-as.character(c("12/03/17 00:21:28", "12/03/17 02:21:28", "12/03/17 04:21:28", "12/03/17 06:21:28"))
temp1<-c(-3.568,-3.568,-3.598,-3.598)
temp2<-c(-11.577,-11.577,-11.541,-11.433)
df<-data.frame(id,date.time,temp1,temp2)

Because the date/time are not in the format I want, I've been using strptime and formatting them as POSIXlt.

Like:

df$date.time<-strptime(df$date.time, "%d/%m/%y %H:%M:%S")
df$date.time<- as.POSIXlt(df$date.time, "%Y/%m/%d %H:%M:%S", tz="GMT0")

This works fine and gives data which looks like:

id  date.time            temp1   temp2
1   2017-03-12 0:21:28  -3.568  -11.577
2   2017-03-12 2:21:28  -3.568  -11.577
3   2017-03-12 4:21:28  -3.598  -11.541
4   2017-03-12 6:21:28  -3.598  -11.433

However, I then want to round the time part of the date.time column to the nearest hour. I've been using:

df$date.time<-round(df$date.time, units="hours")

This works just fine 99% of the time. However, in some of the files, R is deleting date.time values, seemingly at random, and giving an NA. Only one or two values in each file is being deleted and I can't see any reason these particular values would be deleted. For example:

id  date.time         temp1   temp2
1   2017-03-12 0:00:00  -3.568  -11.577
2   NA                  -3.568  -11.577
3   2017-03-12 4:00:00  -3.598  -11.541
4   2017-03-12 6:00:00  -3.598  -11.433

From what I've read, date/time values can be finicky but this just seems oddly random.

Does anyone know what might be causing this issue and if there is a better way to round the time part of POSIXlt values?

Update: It seems the only times which are deleted are those at 2am on March 12th. So, many of the times which should be rounded to 2017-03-12 02:00:00 are being replaced with NAs. But this does not happen with all of the csv files, only about half. Why is R having problems reading this particular date?

Thanks!


Solution

  • Not adding a timezone for strptime is corrupting your string.

    Lets look at the head of df$date.time. Missing/different time zones.

     head(df$date.time)
    [1] "2017-03-12 00:21:28 PST"
    [2] "2017-03-12 02:21:28"    
    [3] "2017-03-12 04:21:28 PDT"
    [4] "2017-03-12 06:21:28 PDT"
    

    By adding tz="GMT0" to your strptime function you should get desired result.

    df$date.time<-strptime(df$date.time, "%d/%m/%y %H:%M:%S", tz="GMT0")
    df$date.time<- as.POSIXlt(df$date.time, "%Y/%m/%d %H:%M:%S", tz="GMT0")
    df$date.time<-round(df$date.time, units="hours")
    
    > head(df$date.time)
    [1] "2017-03-12 00:00:00 GMT"
    [2] "2017-03-12 02:00:00 GMT"
    [3] "2017-03-12 04:00:00 GMT"
    [4] "2017-03-12 06:00:00 GMT"