Search code examples
rpanel-data

How to create a variable that counts years since a string (incident)?


I have a data frame containing information for each country-year on natural disasters. The structure is similar to the following:

test <- data.frame(year = rep(c(1990:1995) , times = 3) ,
                   country = rep(c("aaa" , "bbb" , "ccc") , each = 6) ,
                   incident = c(NA , "Drought" , rep(NA , times = 7) , "Flood" , "Flood" , NA ,
                                rep(NA , times = 6))  ) 

Giving:

   year country incident
1  1990     aaa     <NA>
2  1991     aaa  Drought
3  1992     aaa     <NA>
4  1993     aaa     <NA>
5  1994     aaa     <NA>
6  1995     aaa     <NA>
7  1990     bbb     <NA>
8  1991     bbb     <NA>
9  1992     bbb     <NA>
10 1993     bbb    Flood
11 1994     bbb    Flood
12 1995     bbb     <NA>
13 1990     ccc     <NA>
14 1991     ccc     <NA>
15 1992     ccc     <NA>
16 1993     ccc     <NA>
17 1994     ccc     <NA>
18 1995     ccc     <NA>

I would like to introduce a new variable into this data frame that counts the number of years for each country since the last incident, and replace each string encountered with 0. Something like this:

   year country incident years_since
1  1990     aaa     <NA>          NA
2  1991     aaa  Drought           0
3  1992     aaa     <NA>           1
4  1993     aaa     <NA>           2
5  1994     aaa     <NA>           3
6  1995     aaa     <NA>           4
7  1990     bbb     <NA>          NA
8  1991     bbb     <NA>          NA
9  1992     bbb     <NA>          NA
10 1993     bbb    Flood           0
11 1994     bbb    Flood           0
12 1995     bbb     <NA>           1
13 1990     ccc     <NA>          NA
14 1991     ccc     <NA>          NA
15 1992     ccc     <NA>          NA
16 1993     ccc     <NA>          NA
17 1994     ccc     <NA>          NA
18 1995     ccc     <NA>          NA

Is there a way to do this efficiently using one line of code?


Solution

  • 1) Create a years_since function based on counting the number of positions since the indicated cumsum and apply it by country.

    library(dplyr)
    
    years_since <- function(x) {
      cs <- cumsum(!is.na(x))
      ifelse(cs, seq_along(cs) - match(cs, cs), NA)
    }
    
    test %>%
      group_by(country) %>%
      mutate(years_since = years_since(incident)) %>%
      ungroup
    

    giving:

    # A tibble: 18 × 4
        year country incident years_since
       <int> <chr>   <chr>          <int>
     1  1990 aaa     <NA>              NA
     2  1991 aaa     Drought            0
     3  1992 aaa     <NA>               1
     4  1993 aaa     <NA>               2
     5  1994 aaa     <NA>               3
     6  1995 aaa     <NA>               4
     7  1990 bbb     <NA>              NA
     8  1991 bbb     <NA>              NA
     9  1992 bbb     <NA>              NA
    10  1993 bbb     Flood              0
    11  1994 bbb     Flood              0
    12  1995 bbb     <NA>               1
    13  1990 ccc     <NA>              NA
    14  1991 ccc     <NA>              NA
    15  1992 ccc     <NA>              NA
    16  1993 ccc     <NA>              NA
    17  1994 ccc     <NA>              NA
    18  1995 ccc     <NA>              NA
    

    2) Alternately using only base R:

    transform(test, years_since = ave(incident, country, FUN = years_since))