Search code examples
rdplyrtidyverselubridate

In R, how to find other rows in the same group_by that belong to a date range defined by row?


In R I have this object:

z <- 
  dplyr::tribble(
    ~event_id, ~group_id, ~date_event, ~date_min,   ~date_max,  
    1, 1, "2019-11-11", "2019-11-04", "2019-11-18",
    2, 1, "2019-11-13", "2019-11-06", "2019-11-20",
    3, 1, "2019-11-19", "2019-11-12", "2019-11-26",
    4, 1, "2020-04-30", "2020-04-23", "2020-05-07",
    5, 2, "2019-11-05",  "2019-10-29", "2019-11-12",
    6, 2, "2019-11-26", "2019-11-19", "2019-12-03"
    ) %>%
  dplyr::mutate_if(is.character,lubridate::as_date)

My data follows this structure: an event_id for each row; group_id for each group; a date_event; and a range that is between 7 days before and 7 days after the date_event (date_min and date_max).

What I want to know is: for each row (each event), what are the other events inside the same group_id that match the range (based on date_min and date_max) for that row?!

An output example:

  • event_id = 2, that belongs to group_id = 1, have inside its range the events number 1 and 3, that belong to the same group_id too.

I'm not sure of what output format exactly I want, but I need this result, and I cannot figure out how to solve it.

Someone could help? Thanks in advance. Best, Wlademir.


Solution

  • You can first group, then use purrr::map2 for this. For that to work, you pass in those date thresholds to map over simultaneously and the event id & date vectors as an additional argument:

    library(dplyr)
    library(purrr)
    
    z2 <- z %>%
      group_by(group_id) %>%
      mutate(
        event_ids_in_window = map2(
          date_min, date_max,
          event_ids = event_id, dates = date_event,
          .f = function(date_min, date_max, event_ids, dates) {
            event_ids[which(dates >= date_min & dates <= date_max)]
          }
        )
      )