I am doing an event study in an unbalanced panal data set. The basic structure is that I have a different number of observations (deliveries) for each firm at different points over a period of around 15 years. I am interested in an event (price increase) which is coded as a dummy variable if it occurs and some dummy lead and lags to check if the effect of the price increase on my dependent variable becomes apparent around that event. As an example, for some firms the price increase occurs at 5 deliveries of e.g. 50 over 15 years.
However, now I also want to "simulate" the same event study one year after and before to improve inference. So I want R to duplicate the event dummy for each firm at the delivery closest to one year before and after. The delivery dates occur not daily but on average every 25 days.
So, as code, the data looks something like this:
df <- data.frame(firm_id = c(1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4),
delivery_id = c(1,2,6,9,15,3,5,18,4,7,8,10,11,13,17,19,22,12,14,16,20,21),
date=c("2004-06-16", "2004-08-12", "2004-11-22", "2005-07-03", "2007-01-04",
"2004-09-07", "2005-02-01", "2006-01-17",
"2004-10-11", "2005-02-01", "2005-04-27", "2005-06-01", "2005-07-01",
"2006-01-03", "2007-01-06", "2007-03-24", "2007-05-03",
"2005-08-03", "2006-02-19", "2006-06-13", "2007-02-04", "2007-04-26"),
price_increase = c(0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0),
price_increase_year_before = c(1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0),
price_increase_year_afer = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0))
Creating
firm_id delivery_id date price_increase price_increase_year_before price_increase_year_after
1 1 1 2004-06-16 0 1 0
2 1 2 2004-08-12 0 0 0
3 1 6 2004-11-22 0 0 0
4 1 9 2005-07-03 1 0 0
5 1 15 2007-01-04 0 0 0
6 2 3 2004-09-07 0 0 0
7 2 5 2005-02-01 0 0 0
8 2 18 2006-01-17 0 0 0
9 3 4 2004-10-11 0 0 0
10 3 7 2005-02-01 0 1 0
11 3 8 2005-04-27 0 0 0
12 3 10 2005-06-01 0 0 0
13 3 11 2005-07-01 0 0 0
14 3 13 2006-01-03 1 0 0
15 3 17 2007-01-06 0 0 1
16 3 19 2007-03-24 0 0 0
17 3 22 2007-05-03 0 0 0
18 3 12 2005-08-03 0 0 0
19 4 14 2006-02-19 0 0 0
20 4 16 2006-06-13 0 0 0
21 4 20 2007-02-04 0 0 0
22 4 21 2007-04-26 0 0 0
Where I want to create the two dummy columns on the right based on the price_increase and the date, for each firm. Although I would start with dyplr's group_by
and mutate
approach and an if_else
function, I have no idea how to create a condition that becomes TRUE
when a delivery in one year is +1/-1 month close to the date in the prior or following year and how to select the respective delivery. Do you guys have an idea?
Here is a possible approach using dplyr
.
After group_by(firm_id)
, filter
and include groups where there was a price increase.
Then, create your two dummy variables if the date is one year (+/- 30 days) before or after the date where the price_increase
was equal to 1. Then, would filter
for rows that met these criteria.
Using distinct
you can prevent multiples or duplicates for your dummy variable within a group/firm. Otherwise, if your deliveries were 25 days apart, it seemed like a theoretical possibility.
The rest afterwards is joining back to the original data, replacing the NA
with zero for dummy columns, and sorting.
library(dplyr)
df$date <- as.Date(df$date)
df %>%
group_by(firm_id) %>%
filter(any(price_increase == 1)) %>%
mutate(
price_increase_year_before = ifelse(
between(date[price_increase == 1] - date, 335, 395), 1, 0),
price_increase_year_after = ifelse(
between(date - date[price_increase == 1], 335, 395), 1, 0),
) %>%
filter(price_increase_year_before == 1 | price_increase_year_after == 1) %>%
distinct(firm_id, price_increase_year_before, price_increase_year_after, .keep_all = TRUE) %>%
right_join(df) %>%
replace_na(list(price_increase_year_before = 0, price_increase_year_after = 0)) %>%
arrange(firm_id, date)
Output
firm_id delivery_id date price_increase price_increase_year_before price_increase_year_after
<dbl> <dbl> <date> <dbl> <dbl> <dbl>
1 1 1 2004-06-16 0 1 0
2 1 2 2004-08-12 0 0 0
3 1 6 2004-11-22 0 0 0
4 1 9 2005-07-03 1 0 0
5 1 15 2007-01-04 0 0 0
6 2 3 2004-09-07 0 0 0
7 2 5 2005-02-01 0 0 0
8 2 18 2006-01-17 0 0 0
9 3 4 2004-10-11 0 0 0
10 3 7 2005-02-01 0 1 0
11 3 8 2005-04-27 0 0 0
12 3 10 2005-06-01 0 0 0
13 3 11 2005-07-01 0 0 0
14 3 12 2005-08-03 0 0 0
15 3 13 2006-01-03 1 0 0
16 3 17 2007-01-06 0 0 1
17 3 19 2007-03-24 0 0 0
18 3 22 2007-05-03 0 0 0
19 4 14 2006-02-19 0 0 0
20 4 16 2006-06-13 0 0 0
21 4 20 2007-02-04 0 0 0
22 4 21 2007-04-26 0 0 0