Search code examples
rdummy-variablepanel-data

Create a Variable with value one after "treatment" in year X, and 0 otherwise


I have a panel data of Brazilian municipalities comprising the years 1995-2013. Code is the municipality ID. Nome is the municipality name, and ano is the year of the adoption of a public policy (zero means that they never implemented the policy.)

     code                 Nome  ano
1 1100015 ALTA FLORESTA DOESTE 2010
2 1100023            ARIQUEMES 2006
3 1100031               CABIXI    0
4 1100049               CACOAL 2006
5 1100056           CEREJEIRAS 2014
6 1100064    COLORADO DO OESTE 2006

I have the following question:

How do I generate a dummy variable in R with value 1 after the adoption of a public policy by municipality X in year Z, and 0 otherwise?

Below you can see the expected output in a wide format, which is easier to visualize. (However, I will transform the data into a long format to merge with other data. As you can see, ones in this example only start after 2006, or never.)

      code                      Nome 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
1  1100015      ALTA FLORESTA DOESTE    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    1    1    1
2  1100023                 ARIQUEMES    0    0    0    0    0    0    0    0    0    0    0    1    1    1    1    1    1    1    1
3  1100031                    CABIXI    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
4  1100049                    CACOAL    0    0    0    0    0    0    0    0    0    0    0    1    1    1    1    1    1    1    1
5  1100056                CEREJEIRAS    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
6  1100064         COLORADO DO OESTE    0    0    0    0    0    0    0    0    0    0    0    1    1    1    1    1    1    1    1

I have found similar questions in this forum, but not the one I'm asking.


Solution

  • Here is an option using tidyverse. Create a list column with sequence from each value of 'ano' to 2013, unnest the list column, create a column of 1s ('val'), grouped by 'code', 'Nome', expand the dataset with complete for sequence from 1995 to 2013, while filling the 'val' with 0 where the combination is not present, ungroup and reshape it to 'wide' format with pivot_wider

    library(dplyr)
    library(tidyr) #1.0.0
    library(purrr)
    df1 %>%
       mutate(ano = case_when(between(ano, 1995, 2013) ~ 
            map(ano, ~ .x:2013), TRUE ~ list(NA_integer_))) %>% 
       unnest(ano) %>% 
       mutate(val = 1) %>%
       group_by(code, Nome) %>% 
       complete(ano = 1995:2013, fill = list(val = 0)) %>%
       ungroup %>% 
       filter(!is.na(ano)) %>%
       pivot_wider(names_from = ano, values_from = val)
    # A tibble: 6 x 21
    #     code Nome                 `1995` `1996` `1997` `1998` `1999` `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013`
    #    <int> <chr>                 <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
    #1 1100015 ALTA FLORESTA DOESTE      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      1      1      1      1
    #2 1100023 ARIQUEMES                 0      0      0      0      0      0      0      0      0      0      0      1      1      1      1      1      1      1      1
    #3 1100031 CABIXI                    0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
    #4 1100049 CACOAL                    0      0      0      0      0      0      0      0      0      0      0      1      1      1      1      1      1      1      1
    #5 1100056 CEREJEIRAS                0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
    #6 1100064 COLORADO DO OESTE         0      0      0      0      0      0      0      0      0      0      0      1      1      1      1      1      1      1      1
    

    data

    df1 <- structure(list(code = c(1100015L, 1100023L, 1100031L, 1100049L, 
    1100056L, 1100064L), Nome = c("ALTA FLORESTA DOESTE", "ARIQUEMES", 
    "CABIXI", "CACOAL", "CEREJEIRAS", "COLORADO DO OESTE"), ano = c(2010L, 
    2006L, 0L, 2006L, 2014L, 2006L)), class = "data.frame", row.names = c(NA, 
    -6L))