Search code examples
rcountdowndata-wrangling

Generating sequence in R for specific years


I want to create a decreasing counter for some years in my data. Basically, I have 2 different incident dates and I want to cownt down from the first to the second. I have missing cases with no incidents at all as well.

In my very badly simulated data below, group a has incident 1 in 1995 and incident 2 in 1999. I want in the year 1995 a new column counting down from 4 in 1995, 3 in 1996, 2 in 1997 annd so on until 0. NAs before and after that. How do I do that? I played around with seq, but cant seem to manage to do it.

year <- seq(from = 1990, to=2000)
id <- letters[seq( from = 1, to = 3 )]
df <- data.frame( expand.grid(year, id))
df$inc1[df$Var2 == "a"] <- 1995
df$inc1[df$Var2 == "b"] <- 1992
df$inc2[df$Var2 == "a"] <- 1999
df$inc2[df$Var2 == "b"] <- 1997

The desired result looks like this


   Var1 Var2 toa1 toa2 diff
1  1990    a 1995 1999 NA
2  1991    a 1995 1999 NA
3  1992    a 1995 1999 NA
4  1993    a 1995 1999 NA
5  1994    a 1995 1999 NA
6  1995    a 1995 1999 4
7  1996    a 1995 1999 3
8  1997    a 1995 1999 2
9  1998    a 1995 1999 1
10 1999    a 1995 1999 0
11 2000    a 1995 1999 NA
12 1990    b 1992 1997 NA
13 1991    b 1992 1997 NA
14 1992    b 1992 1997 5
15 1993    b 1992 1997 4
16 1994    b 1992 1997 3
17 1995    b 1992 1997 2
18 1996    b 1992 1997 1
19 1997    b 1992 1997 0
20 1998    b 1992 1997 NA
21 1999    b 1992 1997 NA
22 2000    b 1992 1997 NA
23 1990    c   NA   NA NA
24 1991    c   NA   NA NA
25 1992    c   NA   NA NA
26 1993    c   NA   NA NA
27 1994    c   NA   NA NA
28 1995    c   NA   NA NA
29 1996    c   NA   NA NA
30 1997    c   NA   NA NA
31 1998    c   NA   NA NA
32 1999    c   NA   NA NA
33 2000    c   NA   NA NA

Edit: added result, sorry about the missing years


Solution

  • You can use a combination of rowwise() and case_when() from the dplyr package for complex condition handling:

    year <- seq(from = 1990, to=2000)
    id <- letters[seq( from = 1, to = 3 )]
    df <- data.frame( expand.grid(year, id))
    df$inc1[df$Var2 == "a"] <- 1995
    df$inc1[df$Var2 == "b"] <- 1992
    df$inc2[df$Var2 == "a"] <- 1999
    df$inc2[df$Var2 == "b"] <- 1997
    
    ## ------------------------------------------------------------------------
    
    library(dplyr)
    
    result <- df %>% 
      rowwise() %>% 
      mutate(diff = case_when(
        
        Var1 >= inc1 & Var1 <= inc2 ~ inc2 - Var1
        
      ))
    
    print.data.frame(result)
    #>    Var1 Var2 inc1 inc2 diff
    #> 1  1990    a 1995 1999   NA
    #> 2  1991    a 1995 1999   NA
    #> 3  1992    a 1995 1999   NA
    #> 4  1993    a 1995 1999   NA
    #> 5  1994    a 1995 1999   NA
    #> 6  1995    a 1995 1999    4
    #> 7  1996    a 1995 1999    3
    #> 8  1997    a 1995 1999    2
    #> 9  1998    a 1995 1999    1
    #> 10 1999    a 1995 1999    0
    #> 11 2000    a 1995 1999   NA
    #> 12 1990    b 1992 1997   NA
    #> 13 1991    b 1992 1997   NA
    #> 14 1992    b 1992 1997    5
    #> 15 1993    b 1992 1997    4
    #> 16 1994    b 1992 1997    3
    #> 17 1995    b 1992 1997    2
    #> 18 1996    b 1992 1997    1
    #> 19 1997    b 1992 1997    0
    #> 20 1998    b 1992 1997   NA
    #> 21 1999    b 1992 1997   NA
    #> 22 2000    b 1992 1997   NA
    #> 23 1990    c   NA   NA   NA
    #> 24 1991    c   NA   NA   NA
    #> 25 1992    c   NA   NA   NA
    #> 26 1993    c   NA   NA   NA
    #> 27 1994    c   NA   NA   NA
    #> 28 1995    c   NA   NA   NA
    #> 29 1996    c   NA   NA   NA
    #> 30 1997    c   NA   NA   NA
    #> 31 1998    c   NA   NA   NA
    #> 32 1999    c   NA   NA   NA
    #> 33 2000    c   NA   NA   NA
    

    Created on 2020-11-18 by the reprex package (v0.3.0)

    rowwise() makes sure that the computation is done by row rather than vectorized over the whole column. In the case_when statement, we check that Var1 is greater than or equal to inc1 and smaller than or equal to inc2 - if that is the case, we subtract Var1 from inc2 in each row.