I have multiple data variables that are collected at 15 minute intervals, however some of the variables have timestamps slightly off because the internal clocks in the various sensors were not aligned exactly. In order to merge the various measurements easily, I want to align all timestamps to the closest 15 minute mark.
I want to use something like xts::align.time(), however this function always snaps forwards. I want to be able to snap backwards, or even better, use smart rounding rules. How can I do this?
Here is example code of what I'd like to do using align.time():
require(xts)
require(dplyr)
timestamps <- as.data.frame(as.POSIXlt.character(c("2017-09-11 00:01:39",
"2017-09-11 00:16:39", "2017-09-11 00:31:39", "2017-09-11 00:46:39"), tz
= "", format = "%Y-%m-%d %H:%M:%S"))
values <- as.data.frame(as.numeric(c(1,2,6,0.5)))
variable <- as.data.frame(rep("Chloride", 4))
df <- cbind(timestamps, values, variable); names(df) <- c("DateTime_UTC",
"Value", "Variable")
df %>%
mutate(DateTime_UTC = align.time(DateTime_UTC, n = 60 * 15))
> DateTime_UTC Value Variable
>1 2017-09-11 00:15:00 1.0 Chloride
>2 2017-09-11 00:30:00 2.0 Chloride
>3 2017-09-11 00:45:00 6.0 Chloride
>4 2017-09-11 01:00:00 0.5 Chloride
However I'd prefer the timesnap to produce this:
> DateTime_UTC Value Variable
>1 2017-09-11 00:00:00 1.0 Chloride
>2 2017-09-11 00:15:00 2.0 Chloride
>3 2017-09-11 00:30:00 6.0 Chloride
>4 2017-09-11 00:45:00 0.5 Chloride
I had a look at align.time
and the version you need is align.time.POSIXct
. Now I would assume you could supply a negative n, but you can't.
But you can do two things, create your own align.time function or use floor_date
from the lubridate package. This will round to the nearest unit. Check ?floor_date
for all possible options.
Creating your own function would be like what I did below. I just removed the negative restriction from align.time.POSIXct
and created the function my_align_time
.
my_align_time <- function(x, n = 60) {
structure(unclass(x) + (n - unclass(x) %% n), class=c("POSIXct","POSIXt"))
}
library(lubridate)
library(dplyr)
df %>%
mutate(use_floor_date = floor_date(DateTime_UTC, unit = "15 mins"),
use_my_align_time = my_align_time(DateTime_UTC, n = 60 * -15))
DateTime_UTC Value Variable use_floor use_my_align
1 2017-09-11 00:01:39 1.0 Chloride 2017-09-11 00:00:00 2017-09-11 00:00:00
2 2017-09-11 00:16:39 2.0 Chloride 2017-09-11 00:15:00 2017-09-11 00:15:00
3 2017-09-11 00:31:39 6.0 Chloride 2017-09-11 00:30:00 2017-09-11 00:30:00
4 2017-09-11 00:46:39 0.5 Chloride 2017-09-11 00:45:00 2017-09-11 00:45:00
Of course, now the question is which one is faster? Using a 1000 timestamps, the result is that using the align function is a whole lot faster and the more records the faster it will be in comparison to floor_date
. Of course floor_date
has a lot of checks in there to check if the datetime objects are correct, unit checks etc. etc.
library(microbenchmark)
x <- Sys.time() + 1:1000
microbenchmark(floor = floor_date(x, unit = "15 mins"),
align = my_align_time(x, n = -60 * 100))
Unit: microseconds
expr min lq mean median uq max neval
floor 4598.913 4670.447 4738.57723 4728.228 4781.770 5188.149 100
align 25.454 27.210 32.61044 31.305 33.646 75.484 100