Search code examples
rdata-manipulationlongitudinal

Create a dummy variable indicating if an event occured the past 2 years


I have a longitudinal dataset where I want to create a column that indicates if an event occurred for an individual in the past two years (t-2). I've created toy data that share the main characteristics with my data set (see code below).

set.seed(123)


df <- data.frame(id = sort(rep(1:10,5)),
                 time = rep(2011:2015, 10),
                 event = rbinom(50, 1, 0.2))
                 
head(df,10)   

# Output
   id time event
1   1 2011     0
2   1 2012     0
3   1 2013     0
4   1 2014     1
5   1 2015     1
6   2 2011     0
7   2 2012     0
8   2 2013     1
9   2 2014     0
10  2 2015     0

In this data, I want to create a new column occurred that indicates if the event occurred in the past two years. For the first 10 rows, this would lead to data looking like this:

   id time event occured
1   1 2011     0       0
2   1 2012     0       0
3   1 2013     0       0
4   1 2014     1       1
5   1 2015     1       1
6   2 2011     0       0
7   2 2012     0       0
8   2 2013     1       1
9   2 2014     0       1
10  2 2015     0       1

In the best world, I want the number of years going backward to be a parameter that can be changed (i.e., making it relatively easy to allow occurred to be constructed so it indicates if the event happened the past 1 year backward or the past 4 years as well).

Thank you!


Solution

  • A data.table approach

    assumes there are no missing years in your data.. So all years 2011:2015 are either 0 or 1.

    library(data.table)
    # make it a data.table
    setDT(df)
    # 
    df[, occured := as.numeric(
      frollsum(event, n = 3, align = "right", fill = event[1]) > 0), 
      by = .(id)]
    
    #    id time event occured
    # 1:  1 2011     0       0
    # 2:  1 2012     0       0
    # 3:  1 2013     0       0
    # 4:  1 2014     1       1
    # 5:  1 2015     1       1
    # 6:  2 2011     0       0
    # 7:  2 2012     0       0
    # 8:  2 2013     1       1
    # 9:  2 2014     0       1
    #10:  2 2015     0       1
    #11:  3 2011     1       1
    #12:  3 2012     0       1
    #13:  3 2013     0       1
    #14:  3 2014     0       0
    #15:  3 2015     0       0
    #16:  4 2011     1       1
    #17:  4 2012     0       1
    #18:  4 2013     0       1
    #19:  4 2014     0       0
    #20:  4 2015     1       1
    #  ...