I have a table (dt1
) of fairly regular time points, like this:
DATE TIME READING
2022-02-02 11:50:23 123.34
2022-02-02 11:50:59 125.66
2022-02-02 11:51:16 159.23
2022-02-02 11:52:34 234.22
etc... Note that the interval between time points can be very irregular.
I have another table (dt2
) specifying the start and end times of treatments:
DATE RED_START RED_END GREEN_START GREEN_END
2022-02-02 11:50:30 11:51:12 11:52:12 11:53:17
2022-02-02 11:54:10 11:55:09 11:56:30 11:57:15
It specifies when the treatment was "RED" and when it was "GREEN". Note that there are certain time periods that are neither "RED" or "GREEN".
I'm trying to add a column to the first table to say what the treatment was at the time point given for each row. The desired output would look like this:
DATE TIME READING TREATMENT
2022-02-02 11:50:23 123.34 NULL
2022-02-02 11:50:59 125.66 RED
2022-02-02 11:51:16 159.23 NULL
2022-02-02 11:52:34 234.22 GREEN
I've tried using a non-equi join between the two tables to generate helper columns. e.g.
dt1[, RED := c("TRUE","FALSE")[
dt2[.SD, on=.(DATE, RED_START<=TIME, RED_END>=TIME),
by=.EACHI]
]]
But I've not got the hang of the syntax and it's not working. I'm not sure if I'm stating the multiple conditions incorrectly, or if the non-equi join requires additional info. I would prefer answers using data.table
if possible, to keep it consistent with the rest of my code.
This is a case for data.table::foverlaps()
.
However, your data in an odd format, particularly dt2
.
You need dt1
to have a combined datetime column, and a dummy end time (to create a window to overlap with dt2
).
# DATE TIME READING datetime dummy
# <IDat> <ITime> <num> <POSc> <POSc>
# 1: 2022-02-02 11:50:23 123.34 2022-02-02 11:50:23 2022-02-02 11:50:23
# 2: 2022-02-02 11:50:59 125.66 2022-02-02 11:50:59 2022-02-02 11:50:59
# 3: 2022-02-02 11:51:16 159.23 2022-02-02 11:51:16 2022-02-02 11:51:16
# 4: 2022-02-02 11:52:34 234.22 2022-02-02 11:52:34 2022-02-02 11:52:34
dt2
is in a bizarre half-long, half-wide format. It should look like this:
# color start end
# <char> <POSc> <POSc>
# 1: red 2022-02-02 11:50:30 2022-02-02 11:51:12
# 2: green 2022-02-02 11:52:12 2022-02-02 11:53:17
# 3: red 2022-02-02 11:54:10 2022-02-02 11:55:09
# 4: green 2022-02-02 11:56:30 2022-02-02 11:57:15
I love data.table
, but this is one of those data wrangling cases where the tidyverse
would be easier. This works but it's not beautiful:
as_datetime <- function(x, y) {
as.POSIXct(paste(x, y))
}
dt1[, datetime := as_datetime(DATE, TIME)][
,
dummy := datetime
]
time_cols <- grep("START|END", names(dt2), value = TRUE)
dt2 <- dt2[, (tolower(time_cols)) := lapply(
.SD,
\(y) as_datetime(DATE, y)
), .SDcols = time_cols] |>
melt(
id.vars = "DATE",
measure.vars = patterns("[a-z]_[a-z]")
)
dt2[
,
c("color", "period") := tstrsplit(variable, "_")
]
dt2 <- dcast(dt2, color ~ period, fun.aggregate = list)[, .(
color = rep(color, times = lengths(end)),
start = as.POSIXct(unlist(start), origin = "1970-01-01"),
end = as.POSIXct(unlist(end), origin = "1970-01-01")
)]
Once the data is in the right format, it's simply a case of:
setkey(dt2, start, end)
foverlaps(dt1, dt2, by.x = c("datetime", "dummy"))[
,
.(datetime, READING, start, end, color)
]
# datetime READING start end color
# <POSc> <num> <POSc> <POSc> <char>
# 1: 2022-02-02 11:50:23 123.34 <NA> <NA> <NA>
# 2: 2022-02-02 11:50:59 125.66 2022-02-02 11:50:30 2022-02-02 11:51:12 red
# 3: 2022-02-02 11:51:16 159.23 <NA> <NA> <NA>
# 4: 2022-02-02 11:52:34 234.22 2022-02-02 11:52:12 2022-02-02 11:53:17 green