Search code examples
rdplyrdummy-variable

creating a dummy variable with consecutive cases


I have a similar problem like this one: How can I create a dummy variable over consecutive values by group id?

the difference is: as soon I have the Dummy = 1 I want my dummy for the rest of my group (ID) beeing 1 since year is in descending order. So for example, out of df1:

 df1 <-data.frame(ID = rep(seq(1:3), each = 4),
           year = rep(c(2014, 2015, 2016, 2017),3),
           value = runif(12, min = 0, max = 25),
           Dummy = c(0,0,1,0 ,0,1,0,1, 1,0,0,0))

shall be :

df2 <- data.frame(ID = rep(seq(1:4), 3),
           year = rep(c(2014, 2015, 2016, 2017),3),
           value = runif(12, min = 0, max = 25),
           Dummy = c(0,0,1,1 ,0,1,1, 1, 1,1,1,1))

I've tried something like that (and some others) but that failed:

df2<- df1%>% group_by(ID) %>% arrange(ID , year) %>% 
                        mutate(treated  =  case_when(Dummy == 1 ~ 1,
                        lag(Dummy, n= unique(n()), default = 0) == 1 ~ 1))

Solution

  • If your input data is as below then we can just use cummax():

    library(dplyr)
    
    
    df1 <-data.frame(ID = rep(seq(1:3), each = 4),
                     year = rep(c(2014, 2015, 2016, 2017),3),
                     value = runif(12, min = 0, max = 25),
                     Dummy = c(0,0,1,0 ,0,1,0,1, 1,0,0,0))
    df1
    #>    ID year     value Dummy
    #> 1   1 2014 14.144996     0
    #> 2   1 2015 20.621603     0
    #> 3   1 2016  8.325170     1
    #> 4   1 2017 21.725028     0
    #> 5   2 2014 11.894383     0
    #> 6   2 2015 13.445744     1
    #> 7   2 2016  3.332338     0
    #> 8   2 2017  2.984941     1
    #> 9   3 2014 17.551266     1
    #> 10  3 2015  5.250556     0
    #> 11  3 2016 11.062577     0
    #> 12  3 2017 20.169439     0
    
    
    df1 %>%
      group_by(ID) %>%
      mutate(Dummy = cummax(Dummy))
    
    #> # A tibble: 12 x 4
    #> # Groups:   ID [3]
    #>       ID  year value Dummy
    #>    <int> <dbl> <dbl> <dbl>
    #>  1     1  2014 14.1      0
    #>  2     1  2015 20.6      0
    #>  3     1  2016  8.33     1
    #>  4     1  2017 21.7      1
    #>  5     2  2014 11.9      0
    #>  6     2  2015 13.4      1
    #>  7     2  2016  3.33     1
    #>  8     2  2017  2.98     1
    #>  9     3  2014 17.6      1
    #> 10     3  2015  5.25     1
    #> 11     3  2016 11.1      1
    #> 12     3  2017 20.2      1
    

    Created on 2022-10-14 by the reprex package (v2.0.1)