How to perform multiple linear model fits of one column against all (pairwise) using dplyr and broom

Given the mtcars data:

> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

What I want to do is to perform linear fit with y = mpg as the outcome with all other variables as predictors x , in one on one basis (i.e. from mpg ~ cyl up to mpg ~ carb).

How can I do that with dplyr?

So far I have this code (as suggested by Maurits):

 library(broom)
  library(tidyr)
  library(dplyr)
 lm(mpg ~ ., data = mtcars) %>% 
 # glance() # this fail
 tidy()

It gave me this result:

# A tibble: 11 × 5
   term        estimate std.error statistic p.value
   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)  12.3      18.7        0.657  0.518 
 2 cyl          -0.111     1.05      -0.107  0.916 
 3 disp          0.0133    0.0179     0.747  0.463 
 4 hp           -0.0215    0.0218    -0.987  0.335 
 5 drat          0.787     1.64       0.481  0.635 
 6 wt           -3.72      1.89      -1.96   0.0633
 7 qsec          0.821     0.731      1.12   0.274 
 8 vs            0.318     2.10       0.151  0.881 
 9 am            2.52      2.06       1.23   0.234 
10 gear          0.655     1.49       0.439  0.665 
11 carb         -0.199     0.829     -0.241  0.812

But when I do

> tidy(lm(mpg ~ cyl, data = mtcars))
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    37.9      2.07      18.3  8.37e-18
2 cyl            -2.88     0.322     -8.92 6.11e-10

Notice the difference in estimate from -2.88 to -0.11.

Solution

You could reshape, groupby and then do the lm:

library(tidyverse)
mtcars %>%
  pivot_longer(-mpg) %>%
  group_by(name) %>%
  summarise(broom::tidy(lm(mpg~value, cur_data())), .groups='drop')

 name  term        estimate std.error statistic  p.value
   <chr> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
 1 am    (Intercept)  17.1      1.12       15.2   1.13e-15
 2 am    value         7.24     1.76        4.11  2.85e- 4
 3 carb  (Intercept)  25.9      1.84       14.1   9.22e-15
 4 carb  value        -2.06     0.569      -3.62  1.08e- 3
 5 cyl   (Intercept)  37.9      2.07       18.3   8.37e-18
 6 cyl   value        -2.88     0.322      -8.92  6.11e-10
 7 disp  (Intercept)  29.6      1.23       24.1   3.58e-21
 8 disp  value        -0.0412   0.00471    -8.75  9.38e-10
 9 drat  (Intercept)  -7.52     5.48       -1.37  1.80e- 1
10 drat  value         7.68     1.51        5.10  1.78e- 5
11 gear  (Intercept)   5.62     4.92        1.14  2.62e- 1
12 gear  value         3.92     1.31        3.00  5.40e- 3
13 hp    (Intercept)  30.1      1.63       18.4   6.64e-18
14 hp    value        -0.0682   0.0101     -6.74  1.79e- 7
15 qsec  (Intercept)  -5.11    10.0        -0.510 6.14e- 1
16 qsec  value         1.41     0.559       2.53  1.71e- 2
17 vs    (Intercept)  16.6      1.08       15.4   8.85e-16
18 vs    value         7.94     1.63        4.86  3.42e- 5
19 wt    (Intercept)  37.3      1.88       19.9   8.24e-19
20 wt    value        -5.34     0.559      -9.56  1.29e-10

If only interested in the coefficient values, you can directly use base R:

lm(mpg~ind/values+0, cbind(mpg=mtcars$mpg,stack(mtcars, -mpg)))

Call:
lm(formula = mpg ~ ind/values + 0, data = cbind(mpg = mtcars$mpg, 
    stack(mtcars, -mpg)))

Coefficients:
        indcyl         inddisp           indhp         inddrat           indwt  
      37.88458        29.59985        30.09886        -7.52462        37.28513  
       indqsec           indvs           indam         indgear         indcarb  
      -5.11404        16.61667        17.14737         5.62333        25.87233  
 indcyl:values  inddisp:values    indhp:values  inddrat:values    indwt:values  
      -2.87579        -0.04122        -0.06823         7.67823        -5.34447  
indqsec:values    indvs:values    indam:values  indgear:values  indcarb:values  
       1.41212         7.94048         7.24494         3.92333        -2.05572

Note that the indVAR is the intercept of the variable and indVAR:value is the coefficient. eg intercept for cyl = 37.88458 while the coefficient for cyl = -2.87579. This is the same results given by the tidyverse function