I have a data frame of events and am trying to group all of the events within non-overlapping 120 day periods of the first event. So with the following example data:
library(tidyverse)
df = tibble(
event = c(1, 2, 3, 4, 5, 6),
date = as.Date(c("1992-01-03", "1992-09-07", "1992-11-26", "1993-01-29", "1993-02-18", "1993-04-02")),
duration = c(4, 23, 60, 18, 30, 5)
)
df = df |>
mutate(period_end = date + 120)
df
# A tibble: 5 x 4
event date duration period_end
<dbl> <date> <dbl> <date>
1 1 1992-01-03 4 1992-05-02
2 2 1992-09-07 23 1993-01-05
3 3 1992-11-26 60 1993-03-26
4 4 1993-01-29 18 1993-05-29
5 5 1993-02-18 30 1993-06-18
6 6 1993-04-02 5 1993-07-31
I need to create groups that include the events that start within 120 days from each other. so in this case I would have groups with events A:[1] , B:[2, 3], and C:[ 4, 5, 6]. So group B starts on 1992-9-7 and ends 1993-1-5, and event 4 is not within this window even though it starts within a 2 months of event 3, but it's past the end date of the group. Group C would start with event 4 and extend until "1993-5-29"
I can test if each event is within the window of the previous one with date < lag(period_end)
but this tests each event with the period before. How can I create groups that "look" further back to close groups based on the first event in the group?
If you would like to take the question as a practice of basic programming skills, you can try dynamic programming with loops like below
v <- date + 120
grp <- rep(0, length(date))
id <- 1
i <- 1
repeat {
if (i == length(grp)) break
for (j in i:length(date)) {
if (date[j] <= v[i]) {
grp[j] <- id
} else {
id <- id + 1
break
}
}
i <- j
}
and you will have the grouping information grp
> grp
[1] 1 2 2 3 3 3
date <- as.Date(c("1992-01-03", "1992-09-07", "1992-11-26", "1993-01-29", "1993-02-18", "1993-04-02"))