I have a difference-in-difference model which I want to estimate with fixed effects on R. I want to include exposure and treatment interactions to calculate leading and lagging effects.
Basically, I have variables that are collinear and want to choose which one to exclude from my model. A few days ago this would have been really simple with the lfe
package one could just do:
library(data.table)
library(fixest)
crime <- fread("crime.csv")
# Time as a factor for fixed effects
crime[, time := factor(time)]
# Model
xx <- model.matrix(any_crime ~ treatment:time - 1, data = crime) # time is a factor
# Set period 12 as base
xx[, "treatment:time12"] <- 0
# Model with leads and lags
m1 <- felm(any_crime ~ xx | id + time, data = crime)
However, now that lfe
is archived I have turned to the fixest
package which is much faster. However I can't get feols()
to accept my model.matrix
. As doing the same thing as above but replaceing felm()
with feols()
gives the following error:
> feols(any_crime ~ xx | id + time, data = crime)
Error in feols(any_crime ~ xx | id + time, data = crime)
The variable xx is in the RHS of the formula but not in the dataset.
I have read ?feols()
and the formula argument says nothing about model matrices and the details section is one sentence long.
fixest
also has model.matrix.fixest
, I have also checked help for this function and it says it takes a fixest
object as its first argument. I have been unable to use this to create a model that sets period 12 as the base period.
Here is a small sample of data for reproducibility.
Thank you all.
All variables in the formula of fixest
estimations must be in the data set: contrary to lm
or felm
, there is no evaluation from the global environment.
But your case can be easily dealt with with the function i()
. Here's an example from the vignette:
library(fixest)
data(base_did)
est_did = feols(y ~ x1 + i(treat, period, 5) | id + period, base_did)
est_did
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: id: 108, period: 10
#> Standard-errors: Clustered (id)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.973490 0.045678 21.312000 < 2.2e-16 ***
#> treat:period::1 -1.403000 1.110300 -1.263700 0.209084
#> treat:period::2 -1.247500 1.093100 -1.141200 0.256329
#> treat:period::3 -0.273206 1.106900 -0.246813 0.805526
#> treat:period::4 -1.795700 1.088000 -1.650500 0.101769
#> treat:period::6 0.784452 1.028400 0.762798 0.447262
#> treat:period::7 3.598900 1.101600 3.267100 0.001461 **
#> treat:period::8 3.811800 1.247500 3.055500 0.002837 **
#> treat:period::9 4.731400 1.097100 4.312600 3.6e-05 ***
#> treat:period::10 6.606200 1.120500 5.895800 4.4e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -2,984.58 Adj. R2: 0.48783
#> R2-Within: 0.38963
The function i()
performs an interaction between a continuous variable (although it can be a factor too) and a factor (the 2nd element is always treated as a factor). You can then add the arguments ref
and/or drop
and/or keep
to specify which level from the factor to keep. (The difference between ref
and drop
is subtle: ref
accepts only one value [drop
accepts many] and that reference is highlighted when using coefplot
on the estimation.)
In the previous example, we have ref = 5
, so the 5th period will be excluded from the interaction.
Back to your example, the following should work (without creating any extra data):
feols(any_crime ~ i(treatment, time, 12) | id + time, data = crime)
It's then easy to rebase or drop several:
feols(any_crime ~ i(treatment, time, 10) | id + time, data = crime)
feols(any_crime ~ i(treatment, time, drop = 10:12) | id + time, data = crime)