Search code examples
rtimepanel-dataeconomics

Subdividing panel data to apply a function


I'm trying to create a column of dummy variables to panel data recording whether a treatment was applied to a firm. If a treatment (grant) was applied in a particular year, the variable should record for all years corresponding to that firm. I know it would be appropriate to use the lapply /sapply function or a dplyr group_by() but I'm not really sure how to apply it. Below is the original data:

head(q3data_a)
 A tibble: 6 x 30
   year  fcode employ  sales avgsal scrap rework tothrs union grant   d89   d88 totrain hrsemp lscrap lemploy
  <int>  <dbl>  <int>  <dbl>  <dbl> <dbl>  <dbl>  <int> <int> <int> <int> <int>   <int>  <dbl>  <dbl>   <dbl>
1  1987 410032    100 4.70e7  35000    NA     NA     12     0     0     0     0     100  12        NA    4.61
2  1988 410032    131 4.30e7  37000    NA     NA      8     0     0     0     1      50   3.05     NA    4.88
3  1987 410440     12 1.56e6  10500    NA     NA     12     0     0     0     0      12  12        NA    2.48
4  1988 410440     13 1.97e6  11000    NA     NA     12     0     0     0     1      13  12        NA    2.56
5  1987 410495     20 7.50e5  17680    NA     NA     50     0     0     0     0      15  37.5      NA    3.00
6  1988 410495     25 1.10e5  18720    NA     NA     50     0     0     0     1      10  20        NA    3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
#   clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
#   cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>

And below is my ad-hoc solution. It works, but it does not generalize (and it would be difficult to implement for time periods past 2, for example).

dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods 
for (i in 1:nrow(q3data_a)){   #so if a firm receives a treatment in 1988, it receives a 1 in 1987
  if(i%%2 == 0){
    if (q3data_a[i,]$grant == 1){
      dummy1[i-1] = 1
      dummy1[i] = 1
    }
  }
}

Thanks for any advice.


Solution

  • Is this what you need?

    library(dplyr)
    df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))
    

    df looks like this:

    # A tibble: 12 x 3
        year  fcode grant
       <int>  <dbl> <int>
     1  1985 410032     0
     2  1986 410032     1
     3  1987 410032     1
     4  1988 410032     1
     5  1985 410440     1
     6  1986 410440     0
     7  1987 410440     1
     8  1988 410440     1
     9  1985 410495     0
    10  1986 410495     0
    11  1987 410495     0
    12  1988 410495     0
    

    Output is

    # A tibble: 12 x 4
    # Groups:   fcode [3]
        year  fcode grant dummy1
       <int>  <dbl> <int>  <int>
     1  1985 410032     0      1
     2  1986 410032     1      1
     3  1987 410032     1      1
     4  1988 410032     1      1
     5  1985 410440     1      1
     6  1986 410440     0      1
     7  1987 410440     1      1
     8  1988 410440     1      1
     9  1985 410495     0      0
    10  1986 410495     0      0
    11  1987 410495     0      0
    12  1988 410495     0      0