Search code examples
restimation

How to make a model equation in R fixest and choose factor interactions


I have a difference-in-difference model which I want to estimate with fixed effects on R. I want to include exposure and treatment interactions to calculate leading and lagging effects.

Basically, I have variables that are collinear and want to choose which one to exclude from my model. A few days ago this would have been really simple with the lfe package one could just do:

library(data.table)
library(fixest)

crime <- fread("crime.csv")

# Time as a factor for fixed effects
crime[, time := factor(time)]

# Model
xx <- model.matrix(any_crime ~ treatment:time - 1, data = crime) # time is a factor

# Set period 12 as base
xx[, "treatment:time12"] <- 0

# Model with leads and lags
m1 <- felm(any_crime ~ xx | id + time, data = crime)

However, now that lfe is archived I have turned to the fixest package which is much faster. However I can't get feols() to accept my model.matrix. As doing the same thing as above but replaceing felm() with feols() gives the following error:

> feols(any_crime ~ xx | id + time, data = crime)
Error in feols(any_crime ~ xx | id + time, data = crime)
  The variable xx is in the RHS of the formula but not in the dataset.

I have read ?feols() and the formula argument says nothing about model matrices and the details section is one sentence long.

fixest also has model.matrix.fixest, I have also checked help for this function and it says it takes a fixest object as its first argument. I have been unable to use this to create a model that sets period 12 as the base period.


Here is a small sample of data for reproducibility.

Thank you all.


Solution

  • All variables in the formula of fixest estimations must be in the data set: contrary to lm or felm, there is no evaluation from the global environment.

    But your case can be easily dealt with with the function i(). Here's an example from the vignette:

    library(fixest)
    data(base_did)
    est_did = feols(y ~ x1 + i(treat, period, 5) | id + period, base_did)
    est_did
    #> OLS estimation, Dep. Var.: y
    #> Observations: 1,080 
    #> Fixed-effects: id: 108,  period: 10
    #> Standard-errors: Clustered (id) 
    #>                   Estimate Std. Error   t value  Pr(>|t|)    
    #> x1                0.973490   0.045678 21.312000 < 2.2e-16 ***
    #> treat:period::1  -1.403000   1.110300 -1.263700  0.209084    
    #> treat:period::2  -1.247500   1.093100 -1.141200  0.256329    
    #> treat:period::3  -0.273206   1.106900 -0.246813  0.805526    
    #> treat:period::4  -1.795700   1.088000 -1.650500  0.101769    
    #> treat:period::6   0.784452   1.028400  0.762798  0.447262    
    #> treat:period::7   3.598900   1.101600  3.267100  0.001461 ** 
    #> treat:period::8   3.811800   1.247500  3.055500  0.002837 ** 
    #> treat:period::9   4.731400   1.097100  4.312600   3.6e-05 ***
    #> treat:period::10  6.606200   1.120500  5.895800   4.4e-08 ***
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> Log-likelihood: -2,984.58   Adj. R2: 0.48783 
    #>                           R2-Within: 0.38963
    

    The function i() performs an interaction between a continuous variable (although it can be a factor too) and a factor (the 2nd element is always treated as a factor). You can then add the arguments ref and/or drop and/or keep to specify which level from the factor to keep. (The difference between ref and drop is subtle: ref accepts only one value [drop accepts many] and that reference is highlighted when using coefplot on the estimation.)

    In the previous example, we have ref = 5, so the 5th period will be excluded from the interaction.

    Back to your example, the following should work (without creating any extra data):

    feols(any_crime ~ i(treatment, time, 12) | id + time, data = crime)
    

    It's then easy to rebase or drop several:

    feols(any_crime ~ i(treatment, time, 10) | id + time, data = crime)
    feols(any_crime ~ i(treatment, time, drop = 10:12) | id + time, data = crime)