I have a dataset of protest events in the United States. Some events are stand-alone events, while others persist day-after-day (a "multi-day event"). My dataset is structured at the daily level, so a three-day multi-day event is spread out over three rows.
I want to accomplish the following:
Here's a reproducible example:
# Library
library(tidyverse) # Brings in dplyr
# Generate the data set of protests
test <- data.frame(state = c(rep("Washington", 10), rep("Idaho", 10)),
date = lubridate::ymd(c("2021-01-01", "2021-01-03", "2021-01-04", "2021-01-10", "2021-01-15",
"2021-01-16", "2021-01-17", "2021-01-18", "2021-01-19", "2021-01-28",
"2021-01-12", "2021-01-13", "2021-01-14", "2021-02-01", "2021-02-03",
"2021-02-04", "2021-02-05", "2021-02-08", "2021-02-10", "2021-02-14")))
# Now create some lags and leads
test <- test %>%
group_by(state) %>%
mutate(date_lag = lag(date),
date_lead = lead(date),
days_last = date - date_lag,
days_next = date_lead - date,
link_last = if_else(days_last <= 1, 1, 0),
link_next = if_else(days_next <= 1, 1, 0),
sequence = if_else(link_last == 0 & link_next == 1, "First day",
if_else(is.na(link_last) == TRUE & link_next == 1, "First day",
if_else(link_last == 1 & link_next == 1, "Ongoing",
if_else(link_last == 1 & link_next == 0, "Last day",
if_else(link_last == 1 & is.na(link_next)==TRUE, "Last day", "Not linked"))))))
This generates the following dataframe:
state date date_lag date_lead days_last days_next link_last link_next sequence
<chr> <date> <date> <date> <drtn> <drtn> <dbl> <dbl> <chr>
1 Washington 2021-01-01 NA 2021-01-03 NA days 2 days NA 0 NA
2 Washington 2021-01-03 2021-01-01 2021-01-04 2 days 1 days 0 1 First day
3 Washington 2021-01-04 2021-01-03 2021-01-10 1 days 6 days 1 0 Last day
4 Washington 2021-01-10 2021-01-04 2021-01-15 6 days 5 days 0 0 Not linked
5 Washington 2021-01-15 2021-01-10 2021-01-16 5 days 1 days 0 1 First day
6 Washington 2021-01-16 2021-01-15 2021-01-17 1 days 1 days 1 1 Ongoing
7 Washington 2021-01-17 2021-01-16 2021-01-18 1 days 1 days 1 1 Ongoing
8 Washington 2021-01-18 2021-01-17 2021-01-19 1 days 1 days 1 1 Ongoing
9 Washington 2021-01-19 2021-01-18 2021-01-28 1 days 9 days 1 0 Last day
10 Washington 2021-01-28 2021-01-19 NA 9 days NA days 0 NA NA
11 Idaho 2021-01-12 NA 2021-01-13 NA days 1 days NA 1 NA
12 Idaho 2021-01-13 2021-01-12 2021-01-14 1 days 1 days 1 1 Ongoing
13 Idaho 2021-01-14 2021-01-13 2021-02-01 1 days 18 days 1 0 Last day
14 Idaho 2021-02-01 2021-01-14 2021-02-03 18 days 2 days 0 0 Not linked
15 Idaho 2021-02-03 2021-02-01 2021-02-04 2 days 1 days 0 1 First day
16 Idaho 2021-02-04 2021-02-03 2021-02-05 1 days 1 days 1 1 Ongoing
17 Idaho 2021-02-05 2021-02-04 2021-02-08 1 days 3 days 1 0 Last day
18 Idaho 2021-02-08 2021-02-05 2021-02-10 3 days 2 days 0 0 Not linked
19 Idaho 2021-02-10 2021-02-08 2021-02-14 2 days 4 days 0 0 Not linked
20 Idaho 2021-02-14 2021-02-10 NA 4 days NA days 0 NA NA
state date date_lag date_lead days_last days_next link_last link_next sequence cumulative duration name
<chr> <date> <date> <date> <drtn> <drtn> <dbl> <dbl> <chr> <dbl> <dbl> <chr>
1 Washington 2021-01-01 NA 2021-01-03 NA days 2 days NA 0 NA NA 0 NA
2 Washington 2021-01-03 2021-01-01 2021-01-04 2 days 1 days 0 1 First day 1 2 Washington.1
3 Washington 2021-01-04 2021-01-03 2021-01-10 1 days 6 days 1 0 Last day 2 2 Washington.1
4 Washington 2021-01-10 2021-01-04 2021-01-15 6 days 5 days 0 0 Not linked NA 0 NA
5 Washington 2021-01-15 2021-01-10 2021-01-16 5 days 1 days 0 1 First day 1 5 Washington.2
6 Washington 2021-01-16 2021-01-15 2021-01-17 1 days 1 days 1 1 Ongoing 2 5 Washington.2
7 Washington 2021-01-17 2021-01-16 2021-01-18 1 days 1 days 1 1 Ongoing 3 5 Washington.2
8 Washington 2021-01-18 2021-01-17 2021-01-19 1 days 1 days 1 1 Ongoing 4 5 Washington.2
9 Washington 2021-01-19 2021-01-18 2021-01-28 1 days 9 days 1 0 Last day 5 5 Washington.2
10 Washington 2021-01-28 2021-01-19 NA 9 days NA days 0 NA NA NA NA NA
11 Idaho 2021-01-12 NA 2021-01-13 NA days 1 days NA 1 NA 1 3 Idaho.1
12 Idaho 2021-01-13 2021-01-12 2021-01-14 1 days 1 days 1 1 Ongoing 2 3 Idaho.1
13 Idaho 2021-01-14 2021-01-13 2021-02-01 1 days 18 days 1 0 Last day 3 3 Idaho.1
14 Idaho 2021-02-01 2021-01-14 2021-02-03 18 days 2 days 0 0 Not linked NA NA NA
15 Idaho 2021-02-03 2021-02-01 2021-02-04 2 days 1 days 0 1 First day 1 3 Idaho.2
16 Idaho 2021-02-04 2021-02-03 2021-02-05 1 days 1 days 1 1 Ongoing 2 3 Idaho.2
17 Idaho 2021-02-05 2021-02-04 2021-02-08 1 days 3 days 1 0 Last day 3 3 Idaho.2
18 Idaho 2021-02-08 2021-02-05 2021-02-10 3 days 2 days 0 0 Not linked NA NA NA
19 Idaho 2021-02-10 2021-02-08 2021-02-14 2 days 4 days 0 0 Not linked NA NA NA
20 Idaho 2021-02-14 2021-02-10 NA 4 days NA days 0 NA NA NA NA NA
Side question: Why is test$sequence[11] an NA and not "First day"?
I'm not sure these are the specific numbers you're looking for, but this represents what seems to me a simpler and more idiomatic tidyverse approach:
test %>%
group_by(state) %>%
mutate(days_last = as.numeric(date - lag(date)),
new_section = 1*(is.na(days_last) | days_last > 1), # EDIT
section = cumsum(new_section),
name = paste(state,section, sep = ".")) %>%
group_by(name) %>%
mutate(duration = as.numeric(max(date) - min(date) + 1),
sequence = case_when(duration == 1 ~ "Unlinked",
row_number() == 1 ~ "First Day",
row_number() == n() ~ "Last Day",
TRUE ~ "Ongoing")) %>%
ungroup()
Here, I mark any gap of more than one day as a new event, take the cumulative sum, and use that to define the duration of each event.
# A tibble: 20 x 8
state date days_last new_section section name duration sequence
<chr> <date> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 Washington 2021-01-01 NA 1 1 Washington.1 1 Unlinked
2 Washington 2021-01-03 2 1 2 Washington.2 2 First Day
3 Washington 2021-01-04 1 0 2 Washington.2 2 Last Day
4 Washington 2021-01-10 6 1 3 Washington.3 1 Unlinked
5 Washington 2021-01-15 5 1 4 Washington.4 5 First Day
6 Washington 2021-01-16 1 0 4 Washington.4 5 Ongoing
7 Washington 2021-01-17 1 0 4 Washington.4 5 Ongoing
8 Washington 2021-01-18 1 0 4 Washington.4 5 Ongoing
9 Washington 2021-01-19 1 0 4 Washington.4 5 Last Day
10 Washington 2021-01-28 9 1 5 Washington.5 1 Unlinked
11 Idaho 2021-01-12 NA 1 1 Idaho.1 3 First Day
12 Idaho 2021-01-13 1 0 1 Idaho.1 3 Ongoing
13 Idaho 2021-01-14 1 0 1 Idaho.1 3 Last Day
14 Idaho 2021-02-01 18 1 2 Idaho.2 1 Unlinked
15 Idaho 2021-02-03 2 1 3 Idaho.3 3 First Day
16 Idaho 2021-02-04 1 0 3 Idaho.3 3 Ongoing
17 Idaho 2021-02-05 1 0 3 Idaho.3 3 Last Day
18 Idaho 2021-02-08 3 1 4 Idaho.4 1 Unlinked
19 Idaho 2021-02-10 2 1 5 Idaho.5 1 Unlinked
20 Idaho 2021-02-14 4 1 6 Idaho.6 1 Unlinked