I need to conduct hundreds of multiple regression models on several variables in my dataset. There are too many to run manually and I need to extract their results into one big table.
My real dataset contains data on blood protein concentrations but I have rewritten my code very simply for the mtcars dataset. In my dataset, p
refers to one protein at a time of ~300 total proteins but here refers to columns/variables 2:5 of the mtcars dataset. The following code produces a long-form table containing the stats for each main effect and interaction.
library(tidyverse)
models <- lapply(mtcars[2:5], function(p)
(summary(lm(mpg ~ p * wt + qsec, data = mtcars))))
models_stats <- lapply(models, tidy)
models_table <- do.call(rbind, models_stats)
models_table
OUTPUT:
# A tibble: 20 × 5
term estimate std.error statistic p.value
* <chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 41.6 8.20 5.08 0.0000247
2 p -3.39 0.963 -3.52 0.00154
3 wt -10.8 2.40 -4.51 0.000113
4 qsec 0.757 0.349 2.17 0.0390
5 p:wt 0.977 0.317 3.08 0.00473
6 (Intercept) 30.4 5.82 5.22 0.0000167
7 p -0.0378 0.0138 -2.73 0.0109
8 wt -7.61 1.26 -6.04 0.00000188
9 qsec 0.780 0.291 2.68 0.0123
10 p:wt 0.0106 0.00298 3.55 0.00143
11 (Intercept) 40.3 7.68 5.25 0.0000156
12 p -0.106 0.0263 -4.04 0.000395
13 wt -8.68 1.29 -6.72 0.000000328
14 qsec 0.503 0.361 1.39 0.174
15 p:wt 0.0278 0.00730 3.81 0.000733
16 (Intercept) -7.79 11.3 -0.692 0.495
17 p 7.52 2.81 2.68 0.0124
18 wt 2.80 3.21 0.873 0.390
19 qsec 0.874 0.246 3.55 0.00142
20 p:wt -2.12 0.926 -2.29 0.0301
I am only interested in the interaction and so this is the only coefficient needing to be extracted. How can I: a) remove the p, wt, and qsec rows and keep only the q:wt row? b) make each row identifiable by its variable (i.e. protein name)? c) convert this into an easy to read table
You could try using grep
for the interaction and then rename according to what you specify in your lapply
:
short_table <- models_table[grep("\\:", models_table$term),]
short_table["p_name"] <- names(mtcars[2:5])
Output:
term estimate std.error statistic p.value p_name
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 p:wt 0.977 0.317 3.08 0.00473 cyl
2 p:wt 0.0106 0.00298 3.55 0.00143 disp
3 p:wt 0.0278 0.00730 3.81 0.000733 hp
4 p:wt -2.12 0.926 -2.29 0.0301 drat
Alternatively you could modify the lapply
statement and ignore the broom::tidy()
. For instance:
models2 <- lapply(mtcars[2:5], function(p)
(summary(lm(mpg ~ p * wt + qsec, data = mtcars)))$coefficients["p:wt",])
models2_table <- do.call(rbind, models2)
Which would provide all the statistics for the interaction term:
Estimate Std. Error t value Pr(>|t|)
cyl 0.7571610 0.3490572 2.169160 0.039044270
disp 0.7796404 0.2905776 2.683071 0.012301653
hp 0.5031633 0.3607684 1.394699 0.174476446
drat 0.8739726 0.2458620 3.554728 0.001418666