I am working with a longitudinal dataset of districts. blackout
indicates whether the district experienced at least one blackout during that year.
df <- data.frame(district= rep(c(1000:1003), each=4),
year= rep(c(2000:2003), rep=4),
blackout= c(0,0,1,1,0,0,0,0,1,1,1,1,0,1,0,1))
I want to calculate how many years it took for each district to experience their FIRST blackout.
The new df should look like this
df.1 <- data.frame(district= c(1000:1003),
time= c(3,5,1,2))
Note how blackouts can be intermittent but I only care for the count of the FIRST year. Districts that survived the whole period without a blackout should be listed as 5.
Thank you
This method will not have trouble with missing or unsorted years.
library(tidyverse)
first_blackout <- df |>
filter(blackout == 1) |>
summarize(.by = district, first_blackout = min(year))
df |>
distinct(district) |>
left_join(first_blackout, by = join_by(district)) |>
mutate(time = first_blackout - min(df$year) + 1,
time = replace_na(time, 5))