How do I conduct hundreds of regressions, extract statistics and output these into a table?

I need to conduct hundreds of multiple regression models on several variables in my dataset. There are too many to run manually and I need to extract their results into one big table.

My real dataset contains data on blood protein concentrations but I have rewritten my code very simply for the mtcars dataset. In my dataset, p refers to one protein at a time of ~300 total proteins but here refers to columns/variables 2:5 of the mtcars dataset. The following code produces a long-form table containing the stats for each main effect and interaction.

library(tidyverse)

models <- lapply(mtcars[2:5], function(p) 
  (summary(lm(mpg ~ p * wt + qsec, data = mtcars))))
models_stats <- lapply(models, tidy) 
models_table <- do.call(rbind, models_stats)
models_table

OUTPUT: 
# A tibble: 20 × 5
   term        estimate std.error statistic     p.value
 * <chr>          <dbl>     <dbl>     <dbl>       <dbl>
 1 (Intercept)  41.6      8.20        5.08  0.0000247  
 2 p            -3.39     0.963      -3.52  0.00154    
 3 wt          -10.8      2.40       -4.51  0.000113   
 4 qsec          0.757    0.349       2.17  0.0390     
 5 p:wt          0.977    0.317       3.08  0.00473    
 6 (Intercept)  30.4      5.82        5.22  0.0000167  
 7 p            -0.0378   0.0138     -2.73  0.0109     
 8 wt           -7.61     1.26       -6.04  0.00000188 
 9 qsec          0.780    0.291       2.68  0.0123     
10 p:wt          0.0106   0.00298     3.55  0.00143    
11 (Intercept)  40.3      7.68        5.25  0.0000156  
12 p            -0.106    0.0263     -4.04  0.000395   
13 wt           -8.68     1.29       -6.72  0.000000328
14 qsec          0.503    0.361       1.39  0.174      
15 p:wt          0.0278   0.00730     3.81  0.000733   
16 (Intercept)  -7.79    11.3        -0.692 0.495      
17 p             7.52     2.81        2.68  0.0124     
18 wt            2.80     3.21        0.873 0.390      
19 qsec          0.874    0.246       3.55  0.00142    
20 p:wt         -2.12     0.926      -2.29  0.0301

I am only interested in the interaction and so this is the only coefficient needing to be extracted. How can I: a) remove the p, wt, and qsec rows and keep only the q:wt row? b) make each row identifiable by its variable (i.e. protein name)? c) convert this into an easy to read table

Solution

You could try using grep for the interaction and then rename according to what you specify in your lapply:

short_table <- models_table[grep("\\:", models_table$term),]
short_table["p_name"] <- names(mtcars[2:5])

Output:

term  estimate std.error statistic  p.value p_name
  <chr>    <dbl>     <dbl>     <dbl>    <dbl> <chr> 
1 p:wt    0.977    0.317        3.08 0.00473  cyl   
2 p:wt    0.0106   0.00298      3.55 0.00143  disp  
3 p:wt    0.0278   0.00730      3.81 0.000733 hp    
4 p:wt   -2.12     0.926       -2.29 0.0301   drat

Alternatively you could modify the lapply statement and ignore the broom::tidy(). For instance:

models2 <- lapply(mtcars[2:5], function(p) 
  (summary(lm(mpg ~ p * wt + qsec, data = mtcars)))$coefficients["p:wt",])

models2_table <- do.call(rbind, models2)

Which would provide all the statistics for the interaction term:

      Estimate Std. Error  t value    Pr(>|t|)
cyl  0.7571610  0.3490572 2.169160 0.039044270
disp 0.7796404  0.2905776 2.683071 0.012301653
hp   0.5031633  0.3607684 1.394699 0.174476446
drat 0.8739726  0.2458620 3.554728 0.001418666