Search code examples
rgrouping

group all events within period after first occurence in R


I have a data frame of events and am trying to group all of the events within non-overlapping 120 day periods of the first event. So with the following example data:

library(tidyverse)

df = tibble(
  event = c(1, 2, 3, 4, 5, 6),
  date = as.Date(c("1992-01-03", "1992-09-07", "1992-11-26", "1993-01-29", "1993-02-18", "1993-04-02")),
  duration = c(4, 23, 60, 18, 30, 5)
)
df = df |>
  mutate(period_end = date + 120) 
df

# A tibble: 5 x 4
  event date       duration period_end
  <dbl> <date>        <dbl> <date>    
1     1 1992-01-03        4 1992-05-02
2     2 1992-09-07       23 1993-01-05
3     3 1992-11-26       60 1993-03-26
4     4 1993-01-29       18 1993-05-29
5     5 1993-02-18       30 1993-06-18
6     6 1993-04-02        5 1993-07-31

I need to create groups that include the events that start within 120 days from each other. so in this case I would have groups with events A:[1] , B:[2, 3], and C:[ 4, 5, 6]. So group B starts on 1992-9-7 and ends 1993-1-5, and event 4 is not within this window even though it starts within a 2 months of event 3, but it's past the end date of the group. Group C would start with event 4 and extend until "1993-5-29"

I can test if each event is within the window of the previous one with date < lag(period_end) but this tests each event with the period before. How can I create groups that "look" further back to close groups based on the first event in the group?


Solution

  • If you would like to take the question as a practice of basic programming skills, you can try dynamic programming with loops like below

    v <- date + 120
    grp <- rep(0, length(date))
    id <- 1
    i <- 1
    repeat {
        if (i == length(grp)) break
        for (j in i:length(date)) {
            if (date[j] <= v[i]) {
                grp[j] <- id
            } else {
                id <- id + 1
                break
            }
        }
        i <- j
    }
    

    and you will have the grouping information grp

    > grp
    [1] 1 2 2 3 3 3
    

    Data

    date <- as.Date(c("1992-01-03", "1992-09-07", "1992-11-26", "1993-01-29", "1993-02-18", "1993-04-02"))