Search code examples
rdatedatetimetimelubridate

Summarizing observations within date and time boundaries in R


I have a dataframe called ActiveData that shows the date and times (StartDT and EndDT) when individual IDs were active. Active is structured like this where StartDTandEndDT` are formatted in mdy_hms:

ID         StartDT               EndDT
1  05/05/2021 8:15:00    05/05/2021 9:15:00
2  05/05/2021 8:15:00    05/05/2021 9:15:00
3  05/05/2021 8:15:00    05/05/2021 10:15:00
…

I have another dataframe called Observations that shows observations where each ID observed themselves or another ID satisfying some variable. Here, ID denotes the observer and IDobserved denotes which ID was observed satisfying the variable (IDs can also observe themselves).

ID       DT                          IDobserved
1      05/05/2021 8:19:00                1
1      05/05/2021 8:20:00                1      
1      05/05/2021 8:19:00                2
2      05/05/2021 8:19:20                1
2      05/05/2021 8:19:45                3
3      05/05/2021 8:19:00                1
3      05/05/2021 8:20:00                1
3      05/05/2021 8:25:00                1
3      05/05/2021 8:45:00                3
3      05/05/2021 8:19:00                2
…

I want to summarize the number of times that each ID observed the other IDs (including themselves) satisfying the variable within the time constraints specified by StartDT and EndDT in the ActiveData dataframe, so that the final table would specify the number of observations, and the amount of time in seconds that passed between the boundaries that each ID was actively observing (between StartDT and EndDT in ActiveData). So for the data above, the final table would look like this:

ID   IDobserved   Observations      TimeElapsed
1         1            2             3600
1         2            1             3600
2         1            1             3600
2         3            1             3600
3         1            3             7200
3         2            1             7200
3         3            1             7200

How can this be done?


Solution

  • Cool question! With your data

    ActiveData <- tibble::tribble(
      ~ID, ~StartDT,             ~EndDT,
      1,   "05/05/2021 8:15:00", "05/05/2021 9:15:00",
      2,   "05/05/2021 8:15:00", "05/05/2021 9:15:00",
      3,   "05/05/2021 8:15:00", "05/05/2021 10:15:00"
    )
    
    Observations <- tibble::tribble(
      ~ID, ~DT,                ~IDobserved,
      1,   "05/05/2021 8:19:00", 1,
      1,   "05/05/2021 8:20:00", 1,
      1,   "05/05/2021 8:19:00", 2,
      2,   "05/05/2021 8:19:20", 1,
      2,   "05/05/2021 8:19:45", 3,
      3,   "05/05/2021 8:19:00", 1,
      3,   "05/05/2021 8:20:00", 1,
      3,   "05/05/2021 8:25:00", 1,
      3,   "05/05/2021 8:45:00", 3,
      3,   "05/05/2021 8:19:00", 2
    )
    

    I would do

    library(dplyr)
    
    fmt <- "%d/%m/%Y %H:%M:%S"
    
    ActiveData %>%
      mutate(across(-ID, ~ as.POSIXct(., format = fmt))) %>%
      purrr::pmap(\(...) {
        args <- list(...)
        Observations %>%
          mutate(DT = as.POSIXct(DT, format = fmt)) %>%
          filter(DT >= args$StartDT, DT <= args$EndDT, ID == args$ID) %>%
          count(ID, IDobserved, name = "Observations") %>%
          mutate(TimeElapsed = difftime(args$EndDT,
                                        args$StartDT,
                                        units =  "secs"))
      }) %>%
      bind_rows()
    

    returning

    # A tibble: 7 x 4
         ID IDobserved Observations TimeElapsed
      <dbl>      <dbl>        <int> <drtn>
    1     1          1            2 3600 secs
    2     1          2            1 3600 secs
    3     2          1            1 3600 secs
    4     2          3            1 3600 secs
    5     3          1            3 7200 secs
    6     3          2            1 7200 secs
    7     3          3            1 7200 secs