I have a dataframe called ActiveData
that shows the date and times (StartDT
and EndDT
) when individual ID
s were active. Active is structured like this where
StartDTand
EndDT` are formatted in mdy_hms:
ID StartDT EndDT
1 05/05/2021 8:15:00 05/05/2021 9:15:00
2 05/05/2021 8:15:00 05/05/2021 9:15:00
3 05/05/2021 8:15:00 05/05/2021 10:15:00
…
I have another dataframe called Observations
that shows observations where each ID
observed themselves or another ID
satisfying some variable. Here, ID
denotes the observer and IDobserved
denotes which ID
was observed satisfying the variable (ID
s can also observe themselves).
ID DT IDobserved
1 05/05/2021 8:19:00 1
1 05/05/2021 8:20:00 1
1 05/05/2021 8:19:00 2
2 05/05/2021 8:19:20 1
2 05/05/2021 8:19:45 3
3 05/05/2021 8:19:00 1
3 05/05/2021 8:20:00 1
3 05/05/2021 8:25:00 1
3 05/05/2021 8:45:00 3
3 05/05/2021 8:19:00 2
…
I want to summarize the number of times that each ID
observed the other ID
s (including themselves) satisfying the variable within the time constraints specified by StartDT
and EndDT
in the ActiveData
dataframe, so that the final table would specify the number of observations, and the amount of time in seconds that passed between the boundaries that each ID
was actively observing (between StartDT
and EndDT
in ActiveData
). So for the data above, the final table would look like this:
ID IDobserved Observations TimeElapsed
1 1 2 3600
1 2 1 3600
2 1 1 3600
2 3 1 3600
3 1 3 7200
3 2 1 7200
3 3 1 7200
How can this be done?
Cool question! With your data
ActiveData <- tibble::tribble(
~ID, ~StartDT, ~EndDT,
1, "05/05/2021 8:15:00", "05/05/2021 9:15:00",
2, "05/05/2021 8:15:00", "05/05/2021 9:15:00",
3, "05/05/2021 8:15:00", "05/05/2021 10:15:00"
)
Observations <- tibble::tribble(
~ID, ~DT, ~IDobserved,
1, "05/05/2021 8:19:00", 1,
1, "05/05/2021 8:20:00", 1,
1, "05/05/2021 8:19:00", 2,
2, "05/05/2021 8:19:20", 1,
2, "05/05/2021 8:19:45", 3,
3, "05/05/2021 8:19:00", 1,
3, "05/05/2021 8:20:00", 1,
3, "05/05/2021 8:25:00", 1,
3, "05/05/2021 8:45:00", 3,
3, "05/05/2021 8:19:00", 2
)
I would do
library(dplyr)
fmt <- "%d/%m/%Y %H:%M:%S"
ActiveData %>%
mutate(across(-ID, ~ as.POSIXct(., format = fmt))) %>%
purrr::pmap(\(...) {
args <- list(...)
Observations %>%
mutate(DT = as.POSIXct(DT, format = fmt)) %>%
filter(DT >= args$StartDT, DT <= args$EndDT, ID == args$ID) %>%
count(ID, IDobserved, name = "Observations") %>%
mutate(TimeElapsed = difftime(args$EndDT,
args$StartDT,
units = "secs"))
}) %>%
bind_rows()
returning
# A tibble: 7 x 4
ID IDobserved Observations TimeElapsed
<dbl> <dbl> <int> <drtn>
1 1 1 2 3600 secs
2 1 2 1 3600 secs
3 2 1 1 3600 secs
4 2 3 1 3600 secs
5 3 1 3 7200 secs
6 3 2 1 7200 secs
7 3 3 1 7200 secs