Search code examples
rtime-seriesposixxts

Is there a way to do something like align.time() in reverse?


I have multiple data variables that are collected at 15 minute intervals, however some of the variables have timestamps slightly off because the internal clocks in the various sensors were not aligned exactly. In order to merge the various measurements easily, I want to align all timestamps to the closest 15 minute mark.

I want to use something like xts::align.time(), however this function always snaps forwards. I want to be able to snap backwards, or even better, use smart rounding rules. How can I do this?

Here is example code of what I'd like to do using align.time():

require(xts)
require(dplyr)

timestamps <- as.data.frame(as.POSIXlt.character(c("2017-09-11 00:01:39", 
"2017-09-11 00:16:39", "2017-09-11 00:31:39", "2017-09-11 00:46:39"), tz 
= "", format = "%Y-%m-%d %H:%M:%S"))
values <- as.data.frame(as.numeric(c(1,2,6,0.5)))
variable <- as.data.frame(rep("Chloride", 4))

df <- cbind(timestamps, values, variable); names(df) <- c("DateTime_UTC", 
"Value", "Variable")

df %>%
  mutate(DateTime_UTC = align.time(DateTime_UTC, n = 60 * 15))

>        DateTime_UTC Value Variable
>1 2017-09-11 00:15:00   1.0 Chloride
>2 2017-09-11 00:30:00   2.0 Chloride
>3 2017-09-11 00:45:00   6.0 Chloride
>4 2017-09-11 01:00:00   0.5 Chloride

However I'd prefer the timesnap to produce this:

>        DateTime_UTC Value Variable
>1 2017-09-11 00:00:00   1.0 Chloride
>2 2017-09-11 00:15:00   2.0 Chloride
>3 2017-09-11 00:30:00   6.0 Chloride
>4 2017-09-11 00:45:00   0.5 Chloride

Solution

  • I had a look at align.time and the version you need is align.time.POSIXct. Now I would assume you could supply a negative n, but you can't.

    But you can do two things, create your own align.time function or use floor_date from the lubridate package. This will round to the nearest unit. Check ?floor_date for all possible options.

    Creating your own function would be like what I did below. I just removed the negative restriction from align.time.POSIXct and created the function my_align_time.

    my_align_time <- function(x, n = 60) {
      structure(unclass(x) + (n - unclass(x) %% n), class=c("POSIXct","POSIXt"))
    }
    
    library(lubridate)
    library(dplyr)
    
    df %>%
      mutate(use_floor_date = floor_date(DateTime_UTC, unit = "15 mins"),
             use_my_align_time = my_align_time(DateTime_UTC, n = 60 * -15))
    
             DateTime_UTC Value Variable           use_floor        use_my_align
    1 2017-09-11 00:01:39   1.0 Chloride 2017-09-11 00:00:00 2017-09-11 00:00:00
    2 2017-09-11 00:16:39   2.0 Chloride 2017-09-11 00:15:00 2017-09-11 00:15:00
    3 2017-09-11 00:31:39   6.0 Chloride 2017-09-11 00:30:00 2017-09-11 00:30:00
    4 2017-09-11 00:46:39   0.5 Chloride 2017-09-11 00:45:00 2017-09-11 00:45:00
    

    Of course, now the question is which one is faster? Using a 1000 timestamps, the result is that using the align function is a whole lot faster and the more records the faster it will be in comparison to floor_date. Of course floor_date has a lot of checks in there to check if the datetime objects are correct, unit checks etc. etc.

    library(microbenchmark)
    x <- Sys.time() + 1:1000
    
    microbenchmark(floor = floor_date(x, unit = "15 mins"),
                   align = my_align_time(x, n = -60 * 100))
    
    Unit: microseconds
      expr      min       lq       mean   median       uq      max neval
     floor 4598.913 4670.447 4738.57723 4728.228 4781.770 5188.149   100
     align   25.454   27.210   32.61044   31.305   33.646   75.484   100