Search code examples
rreporting

Omit factor variables in `huxreg`


I am running a regression in R with a lot of time and location fixed effects. I try to output a nice summary table into Latex. I switched from stargazer package to huxtable because stargazer does not behave consistently when omitting fixed effects (see this question).

Here is a simple example:

library(huxtable)

reg1 <- lm(mpg ~ disp, data = mtcars)
reg2 <- lm(mpg ~ disp + factor(gear) + factor(carb), data = mtcars)
huxreg(reg1, reg2) 

The output of huxreg is:

> huxreg(reg1, reg2) 
────────────────────────────────────────────────────
                        (1)              (2)        
                 ───────────────────────────────────
  (Intercept)          29.600 ***       25.533 ***  
                       (1.230)          (2.996)     
  disp                 -0.041 ***       -0.018      
                       (0.005)          (0.011)     
  factor(gear)4                          3.988      
                                        (2.495)     
  factor(gear)5                          5.391 *    
                                        (2.591)     
  factor(carb)2                         -1.979      
                                        (1.667)     
  factor(carb)3                         -4.161      
                                        (2.131)     
  factor(carb)4                         -6.199 *    
                                        (2.221)     
  factor(carb)6                         -8.557 *    
                                        (3.653)     
  factor(carb)8                        -10.389 *    
                                        (4.268)     
                 ───────────────────────────────────
  N                    32               32          
  R2                    0.718            0.828      
  logLik              -82.105          -74.186      
  AIC                 170.209          168.372      
────────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.           

Column names: names, model1, model2

Here is the desired output:

────────────────────────────────────────────────────
                        (1)              (2)        
                 ───────────────────────────────────
  (Intercept)          29.600 ***       25.533 ***  
                       (1.230)          (2.996)     
  disp                 -0.041 ***       -0.018      
                       (0.005)          (0.011) 
                 ───────────────────────────────────    
  Gear FE                No               Yes
  Carb FE                No               Yes
                 ───────────────────────────────────
  N                    32               32          
  R2                    0.718            0.828      
  logLik              -82.105          -74.186      
  AIC                 170.209          168.372      
────────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.           

Column names: names, model1, model2

I know I could simply edit the huxtable using add_rows(), but I am looking for a more robust solution that would allow to find rownames using regular expressions (like stargazer's omit.labels option).


Solution

  • I wrote the answer myself, using this as inspiration.

    The function check_factors() determines if the particular variables are present in the model, and then sapply() is used to create the rows that are added in the table. This is not fully automatic, though, since I still have to check if all the variables listed for omit_coef were later tested by check_factors(). It is possible to omit a variable and then forget to add a corresponding row.

    library(huxtable)
    
    reg1 <- lm(mpg ~ disp, data = mtcars)
    reg2 <- lm(mpg ~ disp + factor(gear) + factor(carb), data = mtcars)
    huxreg(reg1, reg2) 
    
    gear_factors <- tidy(reg2) %>%
      filter(str_detect(term, "factor\\(gear\\)")) %>% ## in R, you have to escape the escape, hence \\
      pull(term)
    
    carb_factors <- tidy(reg2) %>%
      filter(str_detect(term, "factor\\(carb\\)")) %>% 
      pull(term)
    
    check_factors <- function(model, factors) {
      return(all(factors %in% (tidy(model) %>% pull(term))))
    }
    
    models_report <- list(reg1 , reg2)
    huxreg(models_report,
           omit_coefs = c(gear_factors, carb_factors)) %>%
      # add the rows with with True/false returned by check_factors() replased with "Yes"/"No"
      add_rows(rbind(c("Gear FE", 
                       ifelse(sapply(models_report, 
                                     check_factors, 
                                     factors=gear_factors), 
                              "Yes", "No")), 
                     c("Carb FE", 
                       ifelse(sapply(models_report, 
                                     check_factors, 
                                     factors=carb_factors), 
                              "Yes", "No"))),
                     copy_cell_props = FALSE, # this will prevent horizontal lines from appearing
               after = nrow(.) - 5)
    
    

    This produces the following table:

    ────────────────────────────────────────────────────
                            (1)              (2)        
                     ───────────────────────────────────
      (Intercept)          29.600 ***       25.533 ***  
                           (1.230)          (2.996)     
      disp                 -0.041 ***       -0.018      
                           (0.005)          (0.011)     
                     ───────────────────────────────────
      Gear FE          No               Yes             
      Carb FE          No               Yes             
      N                    32               32          
      R2                    0.718            0.828      
      logLik              -82.105          -74.186      
      AIC                 170.209          168.372      
    ────────────────────────────────────────────────────
      *** p < 0.001; ** p < 0.01; * p < 0.05.           
    
    Column names: names, model1, model2