Search code examples
rfor-loopregressionrepeat

Extracting coefficients from 100s of regressions using one split regression rather than a loop?


I need to run over 600 regressions, each on a different MECE group of the data (group takes values {1,2,...,623}). From each regression, I need to store coefficient estimates for all independent variables. I was able to do this by looping through regressions (see below); but, I'm finding this quite slow and I believe there is a better way:

# loop prep
formula <- "dv ~ iv_1 + iv_2 + iv_3 | fe"
ols_stored_coef <- matrix(0, 623, 3)
ols_stored_coef <- as.data.frame(ols_stored_coef )

# loop
for(i in 1:623) {
 #run regression:
 ols <- feols(as.formula(formula), subset(df, group==i))
 # generate coefficients:
 ols_coef <- summary(ols)$coefficients
 ols_coef <- data.frame(as.list(ols_coef))
 # store coefficients:
 ols_stored_coef[i,1] = ols_coef[1,1]
 ols_stored_coef[i,2] = ols_coef[1,2]
 ols_stored_coef[i,3] = ols_coef[1,3]
}

This works, but it takes about 10 minutes to run (there are around 6 million observations and 623 MECE groups). However, I know that the following command estimates all 623 regressions in about 1 minute:

ols_split <- feols(as.formula(formula), df, split=~group)

Regression data is stored all together in a single "List of 623." I am able to extract coefficients per group via the following, where X is the group value.

ols_split $`sample.var: store; sample: X`$coefficients

In an ideal world, I could run this split feols(), and then store the coefficients via looping:

for(i in 1:623) {
  ols_coef <- ols_split $`sample.var: store; sample: i`$coefficients
  ols_coef <- data.frame(as.list(ols_coef))
  # store coefficients:
  ols_stored_coef[i,1] = ols_coef[1,1]
  ols_stored_coef[i,2] = ols_coef[1,2]
  ols_stored_coef[i,3] = ols_coef[1,3]
}

However, because i is in quotations `` I believe it is being read as text and thus not working.

Is there any way I can use the ols_split List of 623 regression results to extract coefficients?


Solution

  • The fixest package that you've used has some in build functions to support this. Here's my example based from yours:

    df <- tibble(
      dv = rnorm(1000),
      iv_1 = rnorm(1000),
      iv_2 = rnorm(1000),
      iv_3 = rnorm(1000),
      fe   = 1,
      group = sample(LETTERS, 1000, replace  = TRUE)
    )
    
    formula <- "dv ~ iv_1 + iv_2 + iv_3 | fe"
    ols_stored_coef <- matrix(0, 623, 3)
    ols_stored_coef <- as.data.frame(ols_stored_coef )
    ols_split <- fixest::feols(as.formula(formula), df, split=~group)
    
    out <- fixest::coeftable(ols_split) 
    head(out)
    
      id sample.var sample coefficient    Estimate Std. Error    t value  Pr(>|t|)
    1  1      group      A        iv_1 -0.04816492  0.2019670 -0.2384791 0.8133102
    2  1      group      A        iv_2 -0.18081949  0.1982410 -0.9121193 0.3697786
    3  1      group      A        iv_3  0.04826683  0.1961902  0.2460206 0.8075269
    4  2      group      B        iv_1 -0.15561382  0.1824392 -0.8529625 0.3993197
    5  2      group      B        iv_2  0.06064802  0.2348541  0.2582370 0.7976946
    6  2      group      B        iv_3 -0.07948869  0.1981408 -0.4011728 0.6906643
    
    

    Of course, if this format isn't what you want and you really do want a matrix that's trivial with some wrangling from here. i.e.

    m <- matrix(out$Estimate, ncol = length(unique(out$coefficient)), byrow = TRUE)
    colnames(m) <- unique(out$coefficient)
    rownames(m) <- unique(out$sample)
    
    head(m)