Search code examples
rdatedummy-variablepanel-data

Create event (dummy) one year before/ after of a dummy variable (or close to)


I am doing an event study in an unbalanced panal data set. The basic structure is that I have a different number of observations (deliveries) for each firm at different points over a period of around 15 years. I am interested in an event (price increase) which is coded as a dummy variable if it occurs and some dummy lead and lags to check if the effect of the price increase on my dependent variable becomes apparent around that event. As an example, for some firms the price increase occurs at 5 deliveries of e.g. 50 over 15 years.

However, now I also want to "simulate" the same event study one year after and before to improve inference. So I want R to duplicate the event dummy for each firm at the delivery closest to one year before and after. The delivery dates occur not daily but on average every 25 days.

So, as code, the data looks something like this:

df <- data.frame(firm_id = c(1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,3,3,3,4,4,4,4),
                   delivery_id = c(1,2,6,9,15,3,5,18,4,7,8,10,11,13,17,19,22,12,14,16,20,21),
                   date=c("2004-06-16", "2004-08-12", "2004-11-22", "2005-07-03", "2007-01-04",
                          "2004-09-07", "2005-02-01", "2006-01-17", 
                          "2004-10-11", "2005-02-01", "2005-04-27", "2005-06-01", "2005-07-01",
                          "2006-01-03", "2007-01-06", "2007-03-24", "2007-05-03", 
                          "2005-08-03", "2006-02-19", "2006-06-13", "2007-02-04", "2007-04-26"),
                   price_increase = c(0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0),
                   price_increase_year_before = c(1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0),
                   price_increase_year_afer = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0))

Creating

     firm_id delivery_id  date     price_increase  price_increase_year_before   price_increase_year_after
1        1           1 2004-06-16              0                          1                        0
2        1           2 2004-08-12              0                          0                        0
3        1           6 2004-11-22              0                          0                        0
4        1           9 2005-07-03              1                          0                        0
5        1          15 2007-01-04              0                          0                        0
6        2           3 2004-09-07              0                          0                        0
7        2           5 2005-02-01              0                          0                        0
8        2          18 2006-01-17              0                          0                        0
9        3           4 2004-10-11              0                          0                        0
10       3           7 2005-02-01              0                          1                        0
11       3           8 2005-04-27              0                          0                        0
12       3          10 2005-06-01              0                          0                        0
13       3          11 2005-07-01              0                          0                        0
14       3          13 2006-01-03              1                          0                        0
15       3          17 2007-01-06              0                          0                        1
16       3          19 2007-03-24              0                          0                        0
17       3          22 2007-05-03              0                          0                        0
18       3          12 2005-08-03              0                          0                        0
19       4          14 2006-02-19              0                          0                        0
20       4          16 2006-06-13              0                          0                        0
21       4          20 2007-02-04              0                          0                        0
22       4          21 2007-04-26              0                          0                        0

Where I want to create the two dummy columns on the right based on the price_increase and the date, for each firm. Although I would start with dyplr's group_by and mutate approach and an if_else function, I have no idea how to create a condition that becomes TRUE when a delivery in one year is +1/-1 month close to the date in the prior or following year and how to select the respective delivery. Do you guys have an idea?


Solution

  • Here is a possible approach using dplyr.

    After group_by(firm_id), filter and include groups where there was a price increase.

    Then, create your two dummy variables if the date is one year (+/- 30 days) before or after the date where the price_increase was equal to 1. Then, would filter for rows that met these criteria.

    Using distinct you can prevent multiples or duplicates for your dummy variable within a group/firm. Otherwise, if your deliveries were 25 days apart, it seemed like a theoretical possibility.

    The rest afterwards is joining back to the original data, replacing the NA with zero for dummy columns, and sorting.

    library(dplyr)
    
    df$date <- as.Date(df$date)
    
    df %>%
      group_by(firm_id) %>%
      filter(any(price_increase == 1)) %>%
      mutate(
        price_increase_year_before = ifelse(
          between(date[price_increase == 1] - date, 335, 395), 1, 0),
        price_increase_year_after = ifelse(
          between(date - date[price_increase == 1], 335, 395), 1, 0),
        ) %>%
      filter(price_increase_year_before == 1 | price_increase_year_after == 1) %>%
      distinct(firm_id, price_increase_year_before, price_increase_year_after, .keep_all = TRUE) %>%
      right_join(df) %>%
      replace_na(list(price_increase_year_before = 0, price_increase_year_after = 0)) %>%
      arrange(firm_id, date)
    

    Output

       firm_id delivery_id date       price_increase price_increase_year_before price_increase_year_after
         <dbl>       <dbl> <date>              <dbl>                      <dbl>                     <dbl>
     1       1           1 2004-06-16              0                          1                         0
     2       1           2 2004-08-12              0                          0                         0
     3       1           6 2004-11-22              0                          0                         0
     4       1           9 2005-07-03              1                          0                         0
     5       1          15 2007-01-04              0                          0                         0
     6       2           3 2004-09-07              0                          0                         0
     7       2           5 2005-02-01              0                          0                         0
     8       2          18 2006-01-17              0                          0                         0
     9       3           4 2004-10-11              0                          0                         0
    10       3           7 2005-02-01              0                          1                         0
    11       3           8 2005-04-27              0                          0                         0
    12       3          10 2005-06-01              0                          0                         0
    13       3          11 2005-07-01              0                          0                         0
    14       3          12 2005-08-03              0                          0                         0
    15       3          13 2006-01-03              1                          0                         0
    16       3          17 2007-01-06              0                          0                         1
    17       3          19 2007-03-24              0                          0                         0
    18       3          22 2007-05-03              0                          0                         0
    19       4          14 2006-02-19              0                          0                         0
    20       4          16 2006-06-13              0                          0                         0
    21       4          20 2007-02-04              0                          0                         0
    22       4          21 2007-04-26              0                          0                         0