I need to run over 600 regressions, each on a different MECE group of the data (group takes values {1,2,...,623}). From each regression, I need to store coefficient estimates for all independent variables. I was able to do this by looping through regressions (see below); but, I'm finding this quite slow and I believe there is a better way:
# loop prep
formula <- "dv ~ iv_1 + iv_2 + iv_3 | fe"
ols_stored_coef <- matrix(0, 623, 3)
ols_stored_coef <- as.data.frame(ols_stored_coef )
# loop
for(i in 1:623) {
#run regression:
ols <- feols(as.formula(formula), subset(df, group==i))
# generate coefficients:
ols_coef <- summary(ols)$coefficients
ols_coef <- data.frame(as.list(ols_coef))
# store coefficients:
ols_stored_coef[i,1] = ols_coef[1,1]
ols_stored_coef[i,2] = ols_coef[1,2]
ols_stored_coef[i,3] = ols_coef[1,3]
}
This works, but it takes about 10 minutes to run (there are around 6 million observations and 623 MECE groups). However, I know that the following command estimates all 623 regressions in about 1 minute:
ols_split <- feols(as.formula(formula), df, split=~group)
Regression data is stored all together in a single "List of 623." I am able to extract coefficients per group via the following, where X is the group value.
ols_split $`sample.var: store; sample: X`$coefficients
In an ideal world, I could run this split feols(), and then store the coefficients via looping:
for(i in 1:623) {
ols_coef <- ols_split $`sample.var: store; sample: i`$coefficients
ols_coef <- data.frame(as.list(ols_coef))
# store coefficients:
ols_stored_coef[i,1] = ols_coef[1,1]
ols_stored_coef[i,2] = ols_coef[1,2]
ols_stored_coef[i,3] = ols_coef[1,3]
}
However, because i is in quotations `` I believe it is being read as text and thus not working.
Is there any way I can use the ols_split List of 623 regression results to extract coefficients?
The fixest package that you've used has some in build functions to support this. Here's my example based from yours:
df <- tibble(
dv = rnorm(1000),
iv_1 = rnorm(1000),
iv_2 = rnorm(1000),
iv_3 = rnorm(1000),
fe = 1,
group = sample(LETTERS, 1000, replace = TRUE)
)
formula <- "dv ~ iv_1 + iv_2 + iv_3 | fe"
ols_stored_coef <- matrix(0, 623, 3)
ols_stored_coef <- as.data.frame(ols_stored_coef )
ols_split <- fixest::feols(as.formula(formula), df, split=~group)
out <- fixest::coeftable(ols_split)
head(out)
id sample.var sample coefficient Estimate Std. Error t value Pr(>|t|)
1 1 group A iv_1 -0.04816492 0.2019670 -0.2384791 0.8133102
2 1 group A iv_2 -0.18081949 0.1982410 -0.9121193 0.3697786
3 1 group A iv_3 0.04826683 0.1961902 0.2460206 0.8075269
4 2 group B iv_1 -0.15561382 0.1824392 -0.8529625 0.3993197
5 2 group B iv_2 0.06064802 0.2348541 0.2582370 0.7976946
6 2 group B iv_3 -0.07948869 0.1981408 -0.4011728 0.6906643
Of course, if this format isn't what you want and you really do want a matrix that's trivial with some wrangling from here. i.e.
m <- matrix(out$Estimate, ncol = length(unique(out$coefficient)), byrow = TRUE)
colnames(m) <- unique(out$coefficient)
rownames(m) <- unique(out$sample)
head(m)