I have a longitudinal dataset where I want to create a column that indicates if an event occurred for an individual in the past two years (t-2
). I've created toy data that share the main characteristics with my data set (see code below).
set.seed(123)
df <- data.frame(id = sort(rep(1:10,5)),
time = rep(2011:2015, 10),
event = rbinom(50, 1, 0.2))
head(df,10)
# Output
id time event
1 1 2011 0
2 1 2012 0
3 1 2013 0
4 1 2014 1
5 1 2015 1
6 2 2011 0
7 2 2012 0
8 2 2013 1
9 2 2014 0
10 2 2015 0
In this data, I want to create a new column occurred
that indicates if the event occurred in the past two years. For the first 10 rows, this would lead to data looking like this:
id time event occured
1 1 2011 0 0
2 1 2012 0 0
3 1 2013 0 0
4 1 2014 1 1
5 1 2015 1 1
6 2 2011 0 0
7 2 2012 0 0
8 2 2013 1 1
9 2 2014 0 1
10 2 2015 0 1
In the best world, I want the number of years going backward to be a parameter that can be changed (i.e., making it relatively easy to allow occurred
to be constructed so it indicates if the event happened the past 1 year backward or the past 4 years as well).
Thank you!
A data.table
approach
assumes there are no missing years in your data.. So all years 2011:2015 are either 0 or 1.
library(data.table)
# make it a data.table
setDT(df)
#
df[, occured := as.numeric(
frollsum(event, n = 3, align = "right", fill = event[1]) > 0),
by = .(id)]
# id time event occured
# 1: 1 2011 0 0
# 2: 1 2012 0 0
# 3: 1 2013 0 0
# 4: 1 2014 1 1
# 5: 1 2015 1 1
# 6: 2 2011 0 0
# 7: 2 2012 0 0
# 8: 2 2013 1 1
# 9: 2 2014 0 1
#10: 2 2015 0 1
#11: 3 2011 1 1
#12: 3 2012 0 1
#13: 3 2013 0 1
#14: 3 2014 0 0
#15: 3 2015 0 0
#16: 4 2011 1 1
#17: 4 2012 0 1
#18: 4 2013 0 1
#19: 4 2014 0 0
#20: 4 2015 1 1
# ...