Search code examples
rdataframe

Count years before occurrence on FIRST event


I am working with a longitudinal dataset of districts. blackout indicates whether the district experienced at least one blackout during that year.

df <- data.frame(district= rep(c(1000:1003), each=4),
                 year= rep(c(2000:2003), rep=4),
                 blackout= c(0,0,1,1,0,0,0,0,1,1,1,1,0,1,0,1))

I want to calculate how many years it took for each district to experience their FIRST blackout.

The new df should look like this

df.1 <- data.frame(district= c(1000:1003),
                 time= c(3,5,1,2))

Note how blackouts can be intermittent but I only care for the count of the FIRST year. Districts that survived the whole period without a blackout should be listed as 5.

Thank you


Solution

  • This method will not have trouble with missing or unsorted years.

    library(tidyverse)
    
    first_blackout <- df |> 
        filter(blackout == 1) |> 
        summarize(.by = district, first_blackout = min(year))
    
    df |> 
        distinct(district) |> 
        left_join(first_blackout, by = join_by(district)) |> 
        mutate(time = first_blackout - min(df$year) + 1,
               time = replace_na(time, 5))