Assign unique ID within a time interval following an event

This is a bit of a curious case for which I have been unable to find a solution on stackoverflow. I have a dataset with a date-time column and a column of values that indicate an event, such as in the dat example below. The date-times are every hour, however, note that occasional "missed" hours exist (2 hours are missing between rows 12 & 13).

dat <- data.frame(datetime = seq(min(as.POSIXct("2010-04-03 03:00:00 UTC")), 
                                 max(as.POSIXct("2010-04-04 10:00:00 UTC")), by = "hour")[-c(13,14)],
                  event = c(1, rep(NA, 9), 2, rep(NA, 5), 3, 4, rep(NA, 9), 5, NA, 6))

> dat
              datetime event
1  2010-04-03 03:00:00     1
2  2010-04-03 04:00:00    NA
3  2010-04-03 05:00:00    NA
4  2010-04-03 06:00:00    NA
5  2010-04-03 07:00:00    NA
6  2010-04-03 08:00:00    NA
7  2010-04-03 09:00:00    NA
8  2010-04-03 10:00:00    NA
9  2010-04-03 11:00:00    NA
10 2010-04-03 12:00:00    NA
11 2010-04-03 13:00:00     2
12 2010-04-03 14:00:00    NA
13 2010-04-03 17:00:00    NA
14 2010-04-03 18:00:00    NA
15 2010-04-03 19:00:00    NA
16 2010-04-03 20:00:00    NA
17 2010-04-03 21:00:00     3
18 2010-04-03 22:00:00     4
19 2010-04-03 23:00:00    NA
20 2010-04-04 00:00:00    NA
21 2010-04-04 01:00:00    NA
22 2010-04-04 02:00:00    NA
23 2010-04-04 03:00:00    NA
24 2010-04-04 04:00:00    NA
25 2010-04-04 05:00:00    NA
26 2010-04-04 06:00:00    NA
27 2010-04-04 07:00:00    NA
28 2010-04-04 08:00:00     5
29 2010-04-04 09:00:00    NA
30 2010-04-04 10:00:00     6

I would like each row within an interval of 7 hours after the event occurs to be identified with a unique identifier, but with the following caveats (hence the "curious case"):

if a subsequent event occurs within the 7 hours of the event prior, that subsequent event is essentially ignored (i.e., "event" number does not equal assigned identifier value), and
missing times are accounted for (i.e., the rule is based on the time elapsed, not the number of rows).

The product would look like result:

library(dplyr)

result <- dat %>% 
  mutate(id = c(rep(1, 8), rep(NA, 2), rep(2, 6), rep(3, 8), rep(NA, 3), rep(4, 3)))

> result
              datetime event id
1  2010-04-03 03:00:00     1  1
2  2010-04-03 04:00:00    NA  1
3  2010-04-03 05:00:00    NA  1
4  2010-04-03 06:00:00    NA  1
5  2010-04-03 07:00:00    NA  1
6  2010-04-03 08:00:00    NA  1
7  2010-04-03 09:00:00    NA  1
8  2010-04-03 10:00:00    NA  1
9  2010-04-03 11:00:00    NA NA
10 2010-04-03 12:00:00    NA NA
11 2010-04-03 13:00:00     2  2
12 2010-04-03 14:00:00    NA  2
13 2010-04-03 17:00:00    NA  2
14 2010-04-03 18:00:00    NA  2
15 2010-04-03 19:00:00    NA  2
16 2010-04-03 20:00:00    NA  2
17 2010-04-03 21:00:00     3  3
18 2010-04-03 22:00:00     4  3
19 2010-04-03 23:00:00    NA  3
20 2010-04-04 00:00:00    NA  3
21 2010-04-04 01:00:00    NA  3
22 2010-04-04 02:00:00    NA  3
23 2010-04-04 03:00:00    NA  3
24 2010-04-04 04:00:00    NA  3
25 2010-04-04 05:00:00    NA NA
26 2010-04-04 06:00:00    NA NA
27 2010-04-04 07:00:00    NA NA
28 2010-04-04 08:00:00     5  4
29 2010-04-04 09:00:00    NA  4
30 2010-04-04 10:00:00     6  4

Most ideally, this would be accomplished in a dplyr framework.

Solution

library(lubridate)
library(tidyverse)

dat <- data.frame(datetime = seq(min(as.POSIXct("2010-04-03 03:00:00 UTC")), 
                                 max(as.POSIXct("2010-04-04 10:00:00 UTC")), by = "hour")[-c(13,14)],
                  event = c(1, rep(NA, 9), 2, rep(NA, 5), 3, 4, rep(NA, 9), 5, NA, 6)) %>% 
  mutate(id = c(rep(1, 8), rep(NA, 2), rep(2, 6), rep(3, 8), rep(NA, 3), rep(4, 3)))


Events <- dat %>% 
  #Get only the roes with events
  filter(!is.na(event)) %>% 
  #Get the duration of time between events
  mutate(
    EventLag = datetime - lag(datetime)) %>% 
  ## remove events that occurred < 7 hrs after the previous or that are NA (i.e. the first one). but in the real data
  ## I do not suspect your first point would ever be an event...? Maybe this can be removed in the 
  ## real dataset...
  filter(as.numeric(EventLag) > 7| is.na(EventLag)) %>% 
  as.data.frame()

## You now have all of the events that are of interest (i.e. those that occurred outside of the 7 hr buffer)
## Give the events a new ID so there are no gaps
## Join them with the rest of the datetime stamps
Events <- Events %>% 
  mutate(ID = row_number()) %>% 
  dplyr::select(datetime, ID)


## Expand each event by 7 hrs
Events <- Events %>%
  group_by(ID) %>%
  do(data.frame(ID= .$ID, datetime= seq(.$datetime, .$datetime + hours(7), by = '1 hour'), stringsAsFactors=FALSE)) %>% 
  as.data.frame()


## Join with initial data by datettime
DatJoin <- dat %>% 
  left_join(Events, by = "datetime")


DatJoin