I have a panel data of Brazilian municipalities comprising the years 1995-2013. Code is the municipality ID. Nome is the municipality name, and ano is the year of the adoption of a public policy (zero means that they never implemented the policy.)
code Nome ano
1 1100015 ALTA FLORESTA DOESTE 2010
2 1100023 ARIQUEMES 2006
3 1100031 CABIXI 0
4 1100049 CACOAL 2006
5 1100056 CEREJEIRAS 2014
6 1100064 COLORADO DO OESTE 2006
I have the following question:
How do I generate a dummy variable in R with value 1 after the adoption of a public policy by municipality X in year Z, and 0 otherwise?
Below you can see the expected output in a wide format, which is easier to visualize. (However, I will transform the data into a long format to merge with other data. As you can see, ones in this example only start after 2006, or never.)
code Nome 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
1 1100015 ALTA FLORESTA DOESTE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
2 1100023 ARIQUEMES 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
3 1100031 CABIXI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1100049 CACOAL 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
5 1100056 CEREJEIRAS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1100064 COLORADO DO OESTE 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
I have found similar questions in this forum, but not the one I'm asking.
Here is an option using tidyverse
. Create a list
column with sequence from each value of 'ano' to 2013, unnest
the list
column, create a column of 1s ('val'), grouped by 'code', 'Nome', expand the dataset with complete
for sequence from 1995 to 2013, while fill
ing the 'val' with 0 where the combination is not present, ungroup
and reshape it to 'wide' format with pivot_wider
library(dplyr)
library(tidyr) #1.0.0
library(purrr)
df1 %>%
mutate(ano = case_when(between(ano, 1995, 2013) ~
map(ano, ~ .x:2013), TRUE ~ list(NA_integer_))) %>%
unnest(ano) %>%
mutate(val = 1) %>%
group_by(code, Nome) %>%
complete(ano = 1995:2013, fill = list(val = 0)) %>%
ungroup %>%
filter(!is.na(ano)) %>%
pivot_wider(names_from = ano, values_from = val)
# A tibble: 6 x 21
# code Nome `1995` `1996` `1997` `1998` `1999` `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013`
# <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1100015 ALTA FLORESTA DOESTE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
#2 1100023 ARIQUEMES 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
#3 1100031 CABIXI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#4 1100049 CACOAL 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
#5 1100056 CEREJEIRAS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#6 1100064 COLORADO DO OESTE 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
df1 <- structure(list(code = c(1100015L, 1100023L, 1100031L, 1100049L,
1100056L, 1100064L), Nome = c("ALTA FLORESTA DOESTE", "ARIQUEMES",
"CABIXI", "CACOAL", "CEREJEIRAS", "COLORADO DO OESTE"), ano = c(2010L,
2006L, 0L, 2006L, 2014L, 2006L)), class = "data.frame", row.names = c(NA,
-6L))