I'm trying to create a column of dummy variables to panel data recording whether a treatment was applied to a firm. If a treatment (grant
) was applied in a particular year, the variable should record for all years corresponding to that firm. I know it would be appropriate to use the lapply /sapply
function or a dplyr group_by()
but I'm not really sure how to apply it. Below is the original data:
head(q3data_a)
A tibble: 6 x 30
year fcode employ sales avgsal scrap rework tothrs union grant d89 d88 totrain hrsemp lscrap lemploy
<int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1987 410032 100 4.70e7 35000 NA NA 12 0 0 0 0 100 12 NA 4.61
2 1988 410032 131 4.30e7 37000 NA NA 8 0 0 0 1 50 3.05 NA 4.88
3 1987 410440 12 1.56e6 10500 NA NA 12 0 0 0 0 12 12 NA 2.48
4 1988 410440 13 1.97e6 11000 NA NA 12 0 0 0 1 13 12 NA 2.56
5 1987 410495 20 7.50e5 17680 NA NA 50 0 0 0 0 15 37.5 NA 3.00
6 1988 410495 25 1.10e5 18720 NA NA 50 0 0 0 1 10 20 NA 3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
# clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
# cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>
And below is my ad-hoc solution. It works, but it does not generalize (and it would be difficult to implement for time periods past 2, for example).
dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods
for (i in 1:nrow(q3data_a)){ #so if a firm receives a treatment in 1988, it receives a 1 in 1987
if(i%%2 == 0){
if (q3data_a[i,]$grant == 1){
dummy1[i-1] = 1
dummy1[i] = 1
}
}
}
Thanks for any advice.
Is this what you need?
library(dplyr)
df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))
df
looks like this:
# A tibble: 12 x 3
year fcode grant
<int> <dbl> <int>
1 1985 410032 0
2 1986 410032 1
3 1987 410032 1
4 1988 410032 1
5 1985 410440 1
6 1986 410440 0
7 1987 410440 1
8 1988 410440 1
9 1985 410495 0
10 1986 410495 0
11 1987 410495 0
12 1988 410495 0
Output is
# A tibble: 12 x 4
# Groups: fcode [3]
year fcode grant dummy1
<int> <dbl> <int> <int>
1 1985 410032 0 1
2 1986 410032 1 1
3 1987 410032 1 1
4 1988 410032 1 1
5 1985 410440 1 1
6 1986 410440 0 1
7 1987 410440 1 1
8 1988 410440 1 1
9 1985 410495 0 0
10 1986 410495 0 0
11 1987 410495 0 0
12 1988 410495 0 0