I have two datasets. The first dataset looks like -
data.frame (year = rep(2005:2022,2), CompanyID = rep(6:7, each = 18))
year CompanyID
1 2005 6
2 2006 6
3 2007 6
4 2008 6
5 2009 6
6 2010 6
7 2011 6
8 2012 6
9 2013 6
10 2014 6
11 2015 6
12 2016 6
13 2017 6
14 2018 6
15 2019 6
16 2020 6
17 2021 6
18 2022 6
19 2005 7
20 2006 7
21 2007 7
22 2008 7
23 2009 7
24 2010 7
25 2011 7
26 2012 7
27 2013 7
28 2014 7
29 2015 7
30 2016 7
31 2017 7
32 2018 7
33 2019 7
34 2020 7
35 2021 7
36 2022 7
The second data set looks like this -
data.frame (year = c(2012, 2007, 2015, 2009), CompanyID = c(6,6,7,7))
year CompanyID
1 2012 6
2 2007 6
3 2015 7
4 2009 7
Now I want the following -
In the first dataset, I want to keep the observations one year before (t-1
) and one year after (t+1
) of the year
variable from second dataset as the CompanyID
in first dataset and second dataset are the same. So my expected output should look like this -
year CompanyID POST
1 2006 6 0
2 2008 6 1
3 2011 6 0
4 2013 6 1
5 2008 7 0
6 2010 7 1
7 2014 7 0
8 2016 7 1
The POST
variable is also needed to identify the observations before year (t-1
) or after year (t+1
)
I have many observations for many CompanyID
, but here I show data only for two CompanyID
, which are 6 and 7.
First modify your second dataset (df_ref
in my code) to include one year before and after the target, also a POST
column specifying if it's before or after. Then join the two together.
library(dplyr)
df_data <- data.frame (year = rep(2005:2022,2), CompanyID = rep(6:7, each = 18))
df_ref <- data.frame (year = c(2012, 2007, 2015, 2009), CompanyID = c(6,6,7,7))
df_ref |>
reframe(year = c(year - 1, year + 1), POST = c(0, 1), .by = c(year, CompanyID)) |>
left_join(df_data, by = c("year", "CompanyID"))
year CompanyID POST
1 2011 6 0
2 2013 6 1
3 2006 6 0
4 2008 6 1
5 2014 7 0
6 2016 7 1
7 2008 7 0
8 2010 7 1