I am trying to compute robust standard errors in R. I am aware of two solutions that do what I want, but are incredibly slow. Hence my questions is whether there's a way that is more efficient. E.g. something that has been coded up in Rcpp.
My context is that I am fitting a model with a large number of variables (fixed effects). I am however not interested in these coefficients, I only care about inference one single coefficient (that of X in the example below).
Fast Solution
???
Slow Solution 1
library(sandwich)
lmfe<-lm(Y ~ X + factor(strata_ids))
coeftest(lmfe, vcov = vcovHC(lmfe, "HC1"))
Slow Solution 2
The manual solution I got from here is:
summaryw <- function(model) {
s <- summary(model)
X <- model.matrix(model)
u2 <- residuals(model)^2
XDX <- 0
## Here one needs to calculate X'DX. But due to the fact that
## D is huge (NxN), it is better to do it with a cycle.
for(i in 1:nrow(X)) {
XDX <- XDX + u2[i]*X[i,]%*%t(X[i,])
}
# inverse(X'X)
XX1 <- solve(t(X)%*%X)
# Variance calculation (Bread x meat x Bread)
varcovar <- XX1 %*% XDX %*% XX1
# degrees of freedom adjustment
dfc <- sqrt(nrow(X))/sqrt(nrow(X)-ncol(X))
# Standard errors of the coefficient estimates are the
# square roots of the diagonal elements
stdh <- dfc*sqrt(diag(varcovar))
t <- model$coefficients/stdh
p <- 2*pnorm(-abs(t))
results <- cbind(model$coefficients, stdh, t, p)
dimnames(results) <- dimnames(s$coefficients)
results
}
This question already has one good answer (i.e. use lfe::felm()
).
For an even faster method, try the new fixest package. Using the OPs example,
library(fixest)
mod = feols(Y ~ X | strata_ids, data = dat)
## SEs are automatically clustered by the strata_ids FE
mod
## We can compute other SEs on the fly with summary.fixest(), e.g.
summary(mod, se = 'standard') ## vanilla
summary(mod, se = 'white') ## HC
# etc
The more general lesson is to avoid modeling fixed-effects as factors in R... or any other language TBH. This is equivalent to the DV approach and will always be slow. Instead, you'll want to use a use a purpose-built package that leverages FWL or some other optimised estimation method.