I want have a list of year-country specific dummies and I want to also mark years two years prior to those years that are marked.
The data looks like this
library(tidyverse)
df <- tribble(
~year, ~country, ~occurrence,
#--|--|----
2003, "USA", 1,
2004, "USA", 0,
2005, "USA", 0,
2006, "USA", 0,
2007, "USA", 0,
2008, "USA", 0,
2009, "USA", 0,
2010, "USA", 0,
2011, "USA", 1,
2012, "USA", 0,
2013, "USA", 0,
2005, "FRA", 0,
2006, "FRA", 0,
2007, "FRA", 1,
2008, "FRA", 1,
2009, "FRA", 0,
2010, "FRA", 0,
2011, "FRA", 0,
2012, "FRA", 0,
2013, "FRA", 0,
2014, "FRA", 0,
2015, "FRA", 1
)
So for "USA"
I also want to put a 1
into column occurence
for the years 2009 and 2010 and for FRA
the years 2005, 2006, 2013 and 2014.
I thought about doing something like this:
df %>%
group_by(country) %>%
mutate(occurence = ifelse("not sure what to put here"),
1,
0))
But I'm not sure how to tell R only to filter for the years I want.
Here is another dplyr solution:
df %>%
group_by(country) %>%
mutate(
occurrence=ifelse( lead(occurrence, 1) %in% 1 |
lead(occurrence, 2) %in% 1,
1, occurrence)
)
# A tibble: 22 x 3
# Groups: country [2]
year country occurrence
<dbl> <chr> <dbl>
1 2003 USA 1
2 2004 USA 0
3 2005 USA 0
4 2006 USA 0
5 2007 USA 0
6 2008 USA 0
7 2009 USA 1
8 2010 USA 1
9 2011 USA 1
10 2012 USA 0
11 2013 USA 0
12 2005 FRA 1
13 2006 FRA 1
14 2007 FRA 1
15 2008 FRA 1
16 2009 FRA 0
17 2010 FRA 0
18 2011 FRA 0
19 2012 FRA 0
20 2013 FRA 1
21 2014 FRA 1
22 2015 FRA 1
lead(occurrence, 1) %in% 1
is used instead of lead(occurrence, 1) == 1
because the latter cannot handle NA
.