Search code examples
rlfe

feols - fixest: loop over dependent variable


I am trying to loop over a set of dependent variables using the feols function from the fixest package. In base using lm or in lfe::felm I would simply use the get() function. With fixest, I receive an error. Why is that, and is there a way to get around it? Here is a reproducable example:

library(data.table)
library(lfe)
library(fixest)

N <- 1000
dt <- data.table(
  x1 = rnorm(N),
  x2 = rnorm(N),
  x3 = rnorm(N)
)

beta <- rnorm(3)
dt[, y1 := x1*beta[1] + x2*beta[2] * x3*beta[3] + rnorm(N)]
dt[, y2 := x1*beta[1] + x2*beta[2] * x3*beta[3] + rnorm(N)]
dt[, y3 := x1*beta[1] + x2*beta[2] * x3*beta[3] + rnorm(N)]

dt
beta
depvars <- c("y1", "y2", "y3")

res_lm <- 
lapply(depvars, function(i){
  res <- lm(get(i) ~ x1 + x2 + x3, data = dt)
  summary(res)
})

res_felm <-
lapply(depvars, function(i){
  res <- felm(get(i) ~ x1 + x2 + x3, data = dt)
  summary(res)
})

res_feols <- 
lapply(depvars, function(i){
  res <- feols(get(i) ~ x1 + x2 + x3, data = dt)
  summary(res)
})

# Error in feols(get(i) ~ x1 + x2 + x3, data = dt) : 
# The variable i is in the LHS of the formula but not in the dataset. 

Solution

  • Update

    From fixest version 0.8.0 onwards, you can perform multiple estimations directly:

    res <- feols(c(y1, y2, y3) ~ x1 + x2 + x3, data = dt)
    etable(res)
    

    The previous code performs 3 estimations, one for each dependent variable. Note that you can also have multiple RHS, multiple fixed-effects or multiple samples (see the vignette).


    Update

    From fixest version 0.7 onwards the formula macro parser accepts character vectors. So the following works:

    depvars <- c("y1", "y2", "y3")
    
    lapply(depvars, function(var) {
        res <- feols(xpd(..lhs ~ x1 + x2 + x3, ..lhs = var), data = dt)
        summary(res)
    })
    

    The answer by Allan is perfectly right. Alternatively you can use some tools within fixest toolbox, namely formula macros.

    You can use the function xpd to "expand" a formula. Here's an example:

    depvars <- list(~ y1, ~ y2, ~ y3)
    
    lapply(depvars, function(var) {
        res <- feols(xpd(..lhs ~ x1 + x2 + x3, ..lhs = var), data = dt)
        summary(res)
    })
    

    Note that the dependent variable here must be represented by one sided formulas instead of character strings. The variable ..lhs starts with two dots and is the "macro" variable that will be replaced by var. Note also that macro variables must start with two dots (to distinguish them from regular variables).

    This leads to the following results (same as Allan):

    #> [[1]]
    #> OLS estimation, Dep. Var.: y1
    #> Observations: 1,000 
    #> Standard-errors: Standard 
    #>              Estimate Std. Error   t value  Pr(>|t|)    
    #> (Intercept) -0.026417   0.033482 -0.788981  0.430311    
    #> x1           0.675098   0.033764 19.995000 < 2.2e-16 ***
    #> x2           0.022227   0.032332  0.687468  0.491948    
    #> x3          -0.001915   0.034032 -0.056276  0.955133    
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> Log-likelihood: -1,470.78   Adj. R2: 0.28434 
    #> 
    #> [[2]]
    #> OLS estimation, Dep. Var.: y2
    #> Observations: 1,000 
    #> Standard-errors: Standard 
    #>              Estimate Std. Error   t value  Pr(>|t|)    
    #> (Intercept) -0.028379   0.034692 -0.818031  0.413535    
    #> x1           0.718648   0.034984 20.542000 < 2.2e-16 ***
    #> x2           0.009986   0.033500  0.298072   0.76571    
    #> x3           0.021206   0.035262  0.601372  0.547729    
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> Log-likelihood: -1,506.27   Adj. R2: 0.29582 
    #> 
    #> [[3]]
    #> OLS estimation, Dep. Var.: y3
    #> Observations: 1,000 
    #> Standard-errors: Standard 
    #>              Estimate Std. Error   t value  Pr(>|t|)    
    #> (Intercept) -0.040832   0.034680 -1.177400  0.239316    
    #> x1           0.689918   0.034972 19.728000 < 2.2e-16 ***
    #> x2          -0.017889   0.033489 -0.534170  0.593343    
    #> x3          -0.028022   0.035250 -0.794952  0.426831    
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> Log-likelihood: -1,505.94   Adj. R2: 0.28041 
    

    Just one more note on macros: you can set them globally too with setFixest_fml. The following code would also have worked:

    depvars <- list(~ y1, ~ y2, ~ y3)
    setFixest_fml(..rhs = ~ x1 + x2 + x3)
    
    lapply(depvars, function(i) {
        res <- feols(xpd(..lhs ~ ..rhs, ..lhs = i), data = dt)
        summary(res)
    })
    

    OK, now the very last note. When you use macros that need no redefinition you can avoid the use of xpd within fixest estimation functions. The following would work:

    setFixest_fml(..lhs = ~ y1, ..rhs = ~ x1 + x2 + x3)
    res <- feols(..lhs ~ ..rhs, data = dt)