Add column to data.table specifying if time is between time points in another table

I have a table (dt1) of fairly regular time points, like this:

      DATE        TIME    READING
2022-02-02    11:50:23     123.34
2022-02-02    11:50:59     125.66
2022-02-02    11:51:16     159.23
2022-02-02    11:52:34     234.22

etc... Note that the interval between time points can be very irregular.

I have another table (dt2) specifying the start and end times of treatments:

      DATE    RED_START     RED_END    GREEN_START    GREEN_END
2022-02-02     11:50:30    11:51:12       11:52:12     11:53:17
2022-02-02     11:54:10    11:55:09       11:56:30     11:57:15

It specifies when the treatment was "RED" and when it was "GREEN". Note that there are certain time periods that are neither "RED" or "GREEN".

I'm trying to add a column to the first table to say what the treatment was at the time point given for each row. The desired output would look like this:

      DATE        TIME    READING   TREATMENT
2022-02-02    11:50:23     123.34        NULL
2022-02-02    11:50:59     125.66         RED
2022-02-02    11:51:16     159.23        NULL
2022-02-02    11:52:34     234.22       GREEN

I've tried using a non-equi join between the two tables to generate helper columns. e.g.

dt1[, RED := c("TRUE","FALSE")[
    dt2[.SD, on=.(DATE, RED_START<=TIME, RED_END>=TIME),
               by=.EACHI]
]]

But I've not got the hang of the syntax and it's not working. I'm not sure if I'm stating the multiple conditions incorrectly, or if the non-equi join requires additional info. I would prefer answers using data.table if possible, to keep it consistent with the rest of my code.

Solution

This is a case for data.table::foverlaps().

However, your data in an odd format, particularly dt2.

Desired format

You need dt1 to have a combined datetime column, and a dummy end time (to create a window to overlap with dt2).

#          DATE     TIME READING            datetime               dummy
#        <IDat>  <ITime>   <num>              <POSc>              <POSc>
# 1: 2022-02-02 11:50:23  123.34 2022-02-02 11:50:23 2022-02-02 11:50:23
# 2: 2022-02-02 11:50:59  125.66 2022-02-02 11:50:59 2022-02-02 11:50:59
# 3: 2022-02-02 11:51:16  159.23 2022-02-02 11:51:16 2022-02-02 11:51:16
# 4: 2022-02-02 11:52:34  234.22 2022-02-02 11:52:34 2022-02-02 11:52:34

dt2 is in a bizarre half-long, half-wide format. It should look like this:

#     color               start                 end
#    <char>              <POSc>              <POSc>
# 1:    red 2022-02-02 11:50:30 2022-02-02 11:51:12
# 2:  green 2022-02-02 11:52:12 2022-02-02 11:53:17
# 3:    red 2022-02-02 11:54:10 2022-02-02 11:55:09
# 4:  green 2022-02-02 11:56:30 2022-02-02 11:57:15

Getting the data into shape

I love data.table, but this is one of those data wrangling cases where the tidyverse would be easier. This works but it's not beautiful:

as_datetime <- function(x, y) {
    as.POSIXct(paste(x, y))
}

dt1[, datetime := as_datetime(DATE, TIME)][
    ,
    dummy := datetime
]

time_cols <- grep("START|END", names(dt2), value = TRUE)
dt2 <- dt2[, (tolower(time_cols)) := lapply(
    .SD,
    \(y) as_datetime(DATE, y)
), .SDcols = time_cols] |>
    melt(
        id.vars = "DATE",
        measure.vars = patterns("[a-z]_[a-z]")
    )
dt2[
    ,
    c("color", "period") := tstrsplit(variable, "_")
]

dt2 <- dcast(dt2, color ~ period, fun.aggregate = list)[, .(
    color = rep(color, times = lengths(end)),
    start = as.POSIXct(unlist(start), origin = "1970-01-01"),
    end = as.POSIXct(unlist(end), origin = "1970-01-01")
)]

The join

Once the data is in the right format, it's simply a case of:

setkey(dt2, start, end)
foverlaps(dt1, dt2, by.x = c("datetime", "dummy"))[
    ,
    .(datetime, READING, start, end, color)
]

#               datetime READING               start                 end  color
#                 <POSc>   <num>              <POSc>              <POSc> <char>
# 1: 2022-02-02 11:50:23  123.34                <NA>                <NA>   <NA>
# 2: 2022-02-02 11:50:59  125.66 2022-02-02 11:50:30 2022-02-02 11:51:12    red
# 3: 2022-02-02 11:51:16  159.23                <NA>                <NA>   <NA>
# 4: 2022-02-02 11:52:34  234.22 2022-02-02 11:52:12 2022-02-02 11:53:17  green