Search code examples
routputlinear-regressionsummarybroom

R: create publishable tables of several regression outputs


I´ve ben analyzing data for a paper and have now obtained results in mutiple linear regression. However the summaries R provides are not really fir for publication in the final paper. Also I have specified one variable in several different ways, to showcase the robustness of the results.

Hw can I create a nice, exportable table in R, that contains Variable Names (ideally also allows to name the variables in a more informative way), estimates, standard errors, robust standard errors p values and ideally also the significance indicators? For illustration:

I have summary outputs like this:

Residuals:
    Min      1Q  Median      3Q     Max 
-50.868  -4.644   1.583   7.054  20.490 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)  
(Intercept)               4.710e+01  1.848e+01   2.549   0.0136 *
Var1                     -8.588e-01  2.201e+00  -0.390   0.6979  
Var2                      2.486e+00  1.055e+00   2.357   0.0220 *
log(specification1)       3.376e+00  2.152e+00   1.569   0.1223  
Var4                     -3.651e-04  2.797e-04  -1.305   0.1971  
Var5                      4.809e+00  2.654e+00   1.812   0.0753 .
Var6                     -8.706e+00  6.972e+00  -1.249   0.2170  
Var7                     -8.172e+00  5.755e+00  -1.420   0.1612  
Var8                     -3.276e+00  7.067e+00  -0.463   0.6448  
Var9                     -1.477e+01  7.849e+00  -1.882   0.0650 .

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

and

Residuals:
    Min      1Q  Median      3Q     Max 
-48.881  -5.699   0.956   8.947  17.888 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)  
(Intercept)               4.258e+01  1.750e+01   2.405   0.0195 *
Var1                      4.298e-01  2.120e+00   0.200   0.8421  
Var2                      5.179e+00  1.027e+00   2.122   0.0271 *
log(specification 2)      2.050e+00  9.435e-01   2.173   0.0338 *
Var4                     -1.420e-04  2.261e-04  -1.513   0.1356  
Var5                      4.584e+00  2.511e+00   1.826   0.0730 .

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

and I would like to get to a table looking something like this:

             Model1                                             Model2
Intercept    Estimate  Std.Error  p robust_Std.Error robust_p  Estimate  Std.Error  p  robust ...
Var1
Var2
Var3
Var4
Var5
Var6
Var7
Var8
Var9

which in the columns of course contains the values of the estimates. Is there a function/ package that does that nicely?

Thanks in advance


Solution

  • I suggest you to use the broom package, like this:

    fit1 <- lm(mpg ~ ., mtcars)
    broom::tidy(fit1)
    
    # # A tibble: 11 x 5
    #    term        estimate std.error statistic p.value
    #    <chr>          <dbl>     <dbl>     <dbl>   <dbl>
    #  1 (Intercept)  12.3      18.7        0.657  0.518 
    #  2 cyl          -0.111     1.05      -0.107  0.916 
    #  3 disp          0.0133    0.0179     0.747  0.463 
    #  4 hp           -0.0215    0.0218    -0.987  0.335 
    #  5 drat          0.787     1.64       0.481  0.635 
    #  6 wt           -3.72      1.89      -1.96   0.0633
    #  7 qsec          0.821     0.731      1.12   0.274 
    #  8 vs            0.318     2.10       0.151  0.881 
    #  9 am            2.52      2.06       1.23   0.234 
    # 10 gear          0.655     1.49       0.439  0.665 
    # 11 carb         -0.199     0.829     -0.241  0.812 
    

    It will extract a tibble from the output of the lm function.


    If you have more than one model and you wanna set all the tibbles together with common terms you can deal with it this way:

    Create a list x of your models.

    fit1 <- lm(mpg ~ cyl + disp + gear, mtcars)
    fit2 <- lm(mpg ~ cyl + hp   + drat, mtcars)
    
    x <- list(fit1, fit2)
    

    You can use this solution:

    library(purrr)
    library(dplyr)
    library(stringr)
    
    # set names for the list
    names(x) <- paste("Model", seq_along(x), sep = "_")
    
    # tidy them up
    x <- map(x, broom::tidy)
    
    # set the list names at the beginning of each column
    x <- imap(x, ~set_names(.x, paste(.y, names(.x), sep = "_")))
    
    # rename each term column as "term"
    x <- map(x, ~rename_with(.x, str_replace, pattern = ".*term", replacement = "term"))
    
    # join them all together
    reduce(x, full_join, by = "term")
    

    It returns the output you asked for:

    # A tibble: 6 x 9
      term        Model1_estimate Model1_std.error Model1_statistic Model1_p.value Model2_estimate Model2_std.error Model2_statistic Model2_p.value
      <chr>                 <dbl>            <dbl>            <dbl>          <dbl>           <dbl>            <dbl>            <dbl>          <dbl>
    1 (Intercept)         34.0              4.76              7.13    0.0000000925         22.5              7.99               2.82        0.00880
    2 cyl                 -1.59             0.724            -2.20    0.0366               -1.36             0.735             -1.85        0.0747 
    3 disp                -0.0200           0.0109           -1.83    0.0774               NA               NA                 NA          NA      
    4 gear                 0.158            0.910             0.174   0.863                NA               NA                 NA          NA      
    5 hp                  NA               NA                NA      NA                    -0.0288           0.0153            -1.88        0.0704 
    6 drat                NA               NA                NA      NA                     2.84             1.52               1.87        0.0725 
    

    If your list has more than two models, the code will be stable.