Search code examples
rdplyrdummy-variablepanel-data

Mutate dummy variable with observation before and after


I've a title-day panel data set (df1). For every title and given day the volume (volume) is coded. There is a variable that you could see as a treatment (v1). In this dataset there is always a treatment, but the day the treatment starts differs per title. When the treatment starts, it stays till the end of the period.

title <- rep(c("x", "y", "z"), each = 5)
day <- rep(c(0,1,2,3,4), times = 3)
volume <- c(0,0,1,1,2,3,0,0,0,0,3,3,4,2,1)
v1 <- c(0,0,1,1,1,0,1,1,1,1,0,0,0,1,1)
df1 <- data.frame(title,day,volume,v1)

I try to mutate a dummy variable that indicates whether the title got any volume (non zero) before AND after the treatment is in place. Where 1 is coded in situations when the title got volume before and after the treatment started. 0 is coded when the title got no volume before or no volume after the treatment started. The dataframe should look like this:

title <- rep(c("x", "y", "z"), each = 5)
day <- rep(c(0,1,2,3,4), times = 3)
volume <- c(0,0,1,1,2,3,0,0,0,0,3,3,4,2,1)
v1 <- c(0,0,1,1,1,0,1,1,1,1,0,0,0,1,1)
new_v <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1)
output <- data.frame(title,day,volume,v1,new_v)

Hope you guys can help me out here.


Solution

  • Here is an approach using dplyr:

    library(dplyr)
    
    df1 %>% 
      group_by(title, v1) %>% 
      mutate(summe = sum(volume)) %>% 
      group_by(title) %>% 
      mutate(dummy_volume = all(summe > 0)) %>% 
      select(-summe)
    
    # A tibble: 15 x 5
    # Groups:   title [3]
       title   day volume    v1 dummy_volume
       <fct> <dbl>  <dbl> <dbl> <lgl>       
     1 x         0      0     0 FALSE       
     2 x         1      0     0 FALSE       
     3 x         2      1     1 FALSE       
     4 x         3      1     1 FALSE       
     5 x         4      2     1 FALSE       
     6 y         0      3     0 FALSE       
     7 y         1      0     1 FALSE       
     8 y         2      0     1 FALSE       
     9 y         3      0     1 FALSE       
    10 y         4      0     1 FALSE       
    11 z         0      3     0 TRUE        
    12 z         1      3     0 TRUE        
    13 z         2      4     0 TRUE        
    14 z         3      2     1 TRUE        
    15 z         4      1     1 TRUE 
    

    With the Dummy coded as 0/1 as in your desired output:

    df1 %>% 
      group_by(title, v1) %>% 
      mutate(summe = sum(volume)) %>% 
      group_by(title) %>% 
      mutate(dummy_volume = as.integer(all(summe > 0))) %>% 
      select(-summe)
    
    # A tibble: 15 x 5
    # Groups:   title [3]
       title   day volume    v1 dummy_volume
       <fct> <dbl>  <dbl> <dbl>        <int>
     1 x         0      0     0            0
     2 x         1      0     0            0
     3 x         2      1     1            0
     4 x         3      1     1            0
     5 x         4      2     1            0
     6 y         0      3     0            0
     7 y         1      0     1            0
     8 y         2      0     1            0
     9 y         3      0     1            0
    10 y         4      0     1            0
    11 z         0      3     0            1
    12 z         1      3     0            1
    13 z         2      4     0            1
    14 z         3      2     1            1
    15 z         4      1     1            1