Search code examples
rdatedatetimetimelubridate

How to filter date and time in one dataset based on values in another dataset


Assume the following dataset (called Active) contains date and time records of when individuals entered and left a room over several days. Individuals are denoted by a unique ID, Start is the time they entered the room, and End is the time they left the room.

ID  Date      Start       End  
1 20-5-2021   13:00:00     13:03:00
1 20-5-2021   13:05:00     13:09:45
1 21-5-2021   12:01:00     13:00:00
2 20-5-2021   13:01:00     13:09:50
2 21-5-2021   13:15:05     14:00:00
2 21-5-2021   15:01:00     15:10:00

Now we have another dataset (called Detections) that contains when each individual (ID) was seen in the room:

ID   Date       Time 
1  20-5-2021    9:02:32
1  20-5-2021   11:02:32
1  20-5-2021   13:02:31
1  20-5-2021   13:08:00
1  20-5-2021   13:08:30
2  20-5-2021   12:07:09
2  20-5-2021   12:30:10
2  20-5-2021   13:07:09
2  21-5-2021   13:50:07
2  21-5-2021   13:51:56

Notice that there are instances where the IDs were seen outside of the Start and End times in Active. We want to filter the rows in Detections where each ID was seen within any of the Start and End boundaries specified in Active. What is the best way to format the Date and Time data, and how can we apply this kind of filter with it in R?

The end result would return the rows from Detections where an ID was seen within the Start and End boundaries specified in Active, like this:

1  20-5-2021   13:02:31
1  20-5-2021   13:08:00
1  20-5-2021   13:08:30
2  20-5-2021   13:07:09
2  21-5-2021   13:50:07
2  21-5-2021   13:51:56

Solution

  • Combine date and time column to create date-time and use fuzzyjoin package to join in range

    library(dplyr)
    library(lubridate)
    library(tidyr)
    
    Active %>%
      mutate(Start = dmy_hms(paste(Date, Start)), 
             End = dmy_hms(paste(Date, End))) %>%
      select(-Date) %>%
      fuzzyjoin::fuzzy_inner_join(Detections %>%
      unite(Datetime, Date, Time, sep = ' ') %>%
      mutate(Datetime = dmy_hms(Datetime)), 
      by = c('ID', 'Start' = 'Datetime', 'End' = 'Datetime'), 
      match_fun = c(`==`, `<=`, `>=`))
    
    #  ID.x               Start                 End ID.y            Datetime
    #1    1 2021-05-20 13:00:00 2021-05-20 13:03:00    1 2021-05-20 13:02:31
    #2    1 2021-05-20 13:05:00 2021-05-20 13:09:45    1 2021-05-20 13:08:00
    #3    1 2021-05-20 13:05:00 2021-05-20 13:09:45    1 2021-05-20 13:08:30
    #4    2 2021-05-20 13:01:00 2021-05-20 13:09:50    2 2021-05-20 13:07:09
    #5    2 2021-05-21 13:15:05 2021-05-21 14:00:00    2 2021-05-21 13:50:07
    #6    2 2021-05-21 13:15:05 2021-05-21 14:00:00    2 2021-05-21 13:51:56