Search code examples
rdummy-variablepanel-dataplm

Efficient creation of dummy variables in fixed effects regression


I have time panel data of 34 countries that describes on which days they have committed to giving military aid in €. I am running a fixed effects regression to study how the sum of this aid changes over time depending on an independent dummy variable that measures the recipient use of this military aid as successful (1) or not visibly successful (0). This independent variable relies on the date column.

To be clear, I am want to do a regression with state fixed effects.

Since I'm measuring this in the time unit days, my problem is that I believe the plm function needs me to assign a dummy variable for each day each country has not given any military aid, meaning that I need 365 dummy variables per year for each of the 34 donator countries.

Since the plm function does not interpret NA values, I have had to transform the "empty" days without committed aid as "none". However this causes the problem of R interpreting this as a state of its own that never gives any aid.

Currently, my dataset looks like this:

State an_date val_eur success
Belgium 22/02/26 7600000 0
Slovakia 22/02/26 11000000 0
none 22/02/27 0 0

Subsequently, when I run this plm model the results are insignificant and the coefficient goes in the opposite direction of what is expected from previous data. t_sq is a squared time control variable.

plm(val_eur ~ success + t_sq, index="state", model="within", data=df)

I would highly appreciate any ideas as to how to create, or make R interpret all the required dummy variables for the regression!

I have tried looking inside the plm function for ways to make it create dummy variables the same way it creates dummy variables for the country fixed effects (by using index="state"), but I have not found any way.

Manually coding the dataset and adding approx 34*365 dummy variables seems like a bit of a coding nightmare.

EDIT, Some more info: when I use factor() to group by days, I get this error message "non-unique values when setting 'row.names'" as several countries commit aid on some dates.

MRE BELOW

Note that for some reason I get an error message when I try this plm regression saying that the model is empty. I do not get this error message in the original model.

#Creating some base example data 
state <- c("Belgium","Slovakia","NA")
an_date <- as.Date(c("26/02/2022","26/02/2022","27/02/2022"), format = "%d/%m/%Y")
as.Date("6/30/2016", format = "%m/%d/%Y")
val_eur <- c(7600000, 11000000, 0)
df <- data.frame(state, an_date, val_eur)

#Creation of a variable telling amount of days since invasion 
inv_date <- as.Date("2022-02-24")
df$t <- difftime(df$an_date,inv_date, units ="days")
#creation of a square time control variable for the regression. 
df$t = as.numeric(df$t)
df$t_sq <- df$t^2

#Creating a time interval that the independent dummy variable uses. 
#bse means "battlefield success effects" and marks a 30 day time period 
#adding a time period for which the ind. var takes the value 1. 
bse <- interval(ymd("2022-02-27"), ymd("2022-03-04"))
df$bse <- df$an_date %within% bse
#Translating the TRUE/FALSE values to a dummy column for battlefield success effects 
df$bse <- as.integer(df$bse)

#attempt at regression
library(plm)

fe_mod <- plm(val_eur ~ bse + t, index=c("state"),
              model="within", data=df)

Solution

  • Your issue might be that your panel is not balanced. Something along these lines might be helpful

    #data
    df <- structure(list(state = c("Belgium", "Slovakia", "NA"), an_date = structure(c(19049,
    19049, 19050), class = "Date"), val_eur = c(7600000, 1.1e+07,
    0), t = c(2, 2, 3), t_sq = c(4, 4, 9), bse = c(0L, 0L, 1L)), row.names = c(NA,
    -3L), class = "data.frame")
    
    
    # libraries
        library(lubridate)
        library(tidyverse)
        
    ## unique days and states
        df %>% filter(state != 'NA') %>% select(state) %>% unique() -> all_states
        df %>% select(an_date) %>% unique() -> all_dates
        
    ## expand to grid with all date/state combinations
        expand.grid(c(all_states, all_dates)) -> x
    
    ## spread df to balanced form and fill out NA's    
        x %>% left_join(df, by=c('an_date','state'))  %>% 
               mutate(t = ifelse(is.na(t), as.Date(an_date) -  as.Date('2022-02-24'),t),  
                      t_sq = ifelse(is.na(t_sq), as.integer((as.Date(an_date) -  as.Date('2022-02-24')))^2,t_sq),  
                      val_eur = ifelse(is.na(val_eur), 0, val_eur), 
                      bse = ifelse((an_date >= as.Date("2022-02-27") & an_date <= as.Date("2022-03-04")), 1,0))  ->  
    balanced_panel_df
    

    balanced panel looks as follows:

    > balanced_panel_df
         state    an_date val_eur t t_sq bse
    1  Belgium 2022-02-26 7.6e+06 2    4   0
    2 Slovakia 2022-02-26 1.1e+07 2    4   0
    3  Belgium 2022-02-27 0.0e+00 3    9   1
    4 Slovakia 2022-02-27 0.0e+00 3    9   1
    

    Here's how you could run a regression

    library(fixest)
    feols(val_eur ~ bse + t | state, data=balanced_panel_df)
    

    If you really want to do time fixed effects, you can use

    balanced_panel_df$t <- as.factor(balanced_panel_df$t)
    feols(val_eur ~ bse| state + t, data=balanced_panel_df)