Assume the following dataset (called Active
) contains date and time records of when individuals entered and left a room over several days. Individuals are denoted by a unique ID
, Start
is the time they entered the room, and End
is the time they left the room.
ID Date Start End
1 20-5-2021 13:00:00 13:03:00
1 20-5-2021 13:05:00 13:09:45
1 21-5-2021 12:01:00 13:00:00
2 20-5-2021 13:01:00 13:09:50
2 21-5-2021 13:15:05 14:00:00
2 21-5-2021 15:01:00 15:10:00
Now we have another dataset (called Detections
) that contains when each individual (ID
) was seen in the room:
ID Date Time
1 20-5-2021 9:02:32
1 20-5-2021 11:02:32
1 20-5-2021 13:02:31
1 20-5-2021 13:08:00
1 20-5-2021 13:08:30
2 20-5-2021 12:07:09
2 20-5-2021 12:30:10
2 20-5-2021 13:07:09
2 21-5-2021 13:50:07
2 21-5-2021 13:51:56
Notice that there are instances where the ID
s were seen outside of the Start
and End
times in Active
. We want to filter the rows in Detections
where each ID
was seen within any of the Start
and End
boundaries specified in Active
. What is the best way to format the Date
and Time
data, and how can we apply this kind of filter with it in R?
The end result would return the rows from Detections
where an ID
was seen within the Start
and End
boundaries specified in Active
, like this:
1 20-5-2021 13:02:31
1 20-5-2021 13:08:00
1 20-5-2021 13:08:30
2 20-5-2021 13:07:09
2 21-5-2021 13:50:07
2 21-5-2021 13:51:56
Combine date and time column to create date-time and use fuzzyjoin
package to join in range
library(dplyr)
library(lubridate)
library(tidyr)
Active %>%
mutate(Start = dmy_hms(paste(Date, Start)),
End = dmy_hms(paste(Date, End))) %>%
select(-Date) %>%
fuzzyjoin::fuzzy_inner_join(Detections %>%
unite(Datetime, Date, Time, sep = ' ') %>%
mutate(Datetime = dmy_hms(Datetime)),
by = c('ID', 'Start' = 'Datetime', 'End' = 'Datetime'),
match_fun = c(`==`, `<=`, `>=`))
# ID.x Start End ID.y Datetime
#1 1 2021-05-20 13:00:00 2021-05-20 13:03:00 1 2021-05-20 13:02:31
#2 1 2021-05-20 13:05:00 2021-05-20 13:09:45 1 2021-05-20 13:08:00
#3 1 2021-05-20 13:05:00 2021-05-20 13:09:45 1 2021-05-20 13:08:30
#4 2 2021-05-20 13:01:00 2021-05-20 13:09:50 2 2021-05-20 13:07:09
#5 2 2021-05-21 13:15:05 2021-05-21 14:00:00 2 2021-05-21 13:50:07
#6 2 2021-05-21 13:15:05 2021-05-21 14:00:00 2 2021-05-21 13:51:56