Given the mtcars
data:
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
What I want to do is to perform linear fit with y = mpg
as the outcome with all other variables as predictors x
, in one on one basis (i.e. from mpg ~ cyl
up to mpg ~ carb
).
How can I do that with dplyr?
So far I have this code (as suggested by Maurits):
library(broom)
library(tidyr)
library(dplyr)
lm(mpg ~ ., data = mtcars) %>%
# glance() # this fail
tidy()
It gave me this result:
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 12.3 18.7 0.657 0.518
2 cyl -0.111 1.05 -0.107 0.916
3 disp 0.0133 0.0179 0.747 0.463
4 hp -0.0215 0.0218 -0.987 0.335
5 drat 0.787 1.64 0.481 0.635
6 wt -3.72 1.89 -1.96 0.0633
7 qsec 0.821 0.731 1.12 0.274
8 vs 0.318 2.10 0.151 0.881
9 am 2.52 2.06 1.23 0.234
10 gear 0.655 1.49 0.439 0.665
11 carb -0.199 0.829 -0.241 0.812
But when I do
> tidy(lm(mpg ~ cyl, data = mtcars))
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 37.9 2.07 18.3 8.37e-18
2 cyl -2.88 0.322 -8.92 6.11e-10
Notice the difference in estimate from -2.88 to -0.11.
You could reshape, groupby and then do the lm:
library(tidyverse)
mtcars %>%
pivot_longer(-mpg) %>%
group_by(name) %>%
summarise(broom::tidy(lm(mpg~value, cur_data())), .groups='drop')
name term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 am (Intercept) 17.1 1.12 15.2 1.13e-15
2 am value 7.24 1.76 4.11 2.85e- 4
3 carb (Intercept) 25.9 1.84 14.1 9.22e-15
4 carb value -2.06 0.569 -3.62 1.08e- 3
5 cyl (Intercept) 37.9 2.07 18.3 8.37e-18
6 cyl value -2.88 0.322 -8.92 6.11e-10
7 disp (Intercept) 29.6 1.23 24.1 3.58e-21
8 disp value -0.0412 0.00471 -8.75 9.38e-10
9 drat (Intercept) -7.52 5.48 -1.37 1.80e- 1
10 drat value 7.68 1.51 5.10 1.78e- 5
11 gear (Intercept) 5.62 4.92 1.14 2.62e- 1
12 gear value 3.92 1.31 3.00 5.40e- 3
13 hp (Intercept) 30.1 1.63 18.4 6.64e-18
14 hp value -0.0682 0.0101 -6.74 1.79e- 7
15 qsec (Intercept) -5.11 10.0 -0.510 6.14e- 1
16 qsec value 1.41 0.559 2.53 1.71e- 2
17 vs (Intercept) 16.6 1.08 15.4 8.85e-16
18 vs value 7.94 1.63 4.86 3.42e- 5
19 wt (Intercept) 37.3 1.88 19.9 8.24e-19
20 wt value -5.34 0.559 -9.56 1.29e-10
If only interested in the coefficient values, you can directly use base R:
lm(mpg~ind/values+0, cbind(mpg=mtcars$mpg,stack(mtcars, -mpg)))
Call:
lm(formula = mpg ~ ind/values + 0, data = cbind(mpg = mtcars$mpg,
stack(mtcars, -mpg)))
Coefficients:
indcyl inddisp indhp inddrat indwt
37.88458 29.59985 30.09886 -7.52462 37.28513
indqsec indvs indam indgear indcarb
-5.11404 16.61667 17.14737 5.62333 25.87233
indcyl:values inddisp:values indhp:values inddrat:values indwt:values
-2.87579 -0.04122 -0.06823 7.67823 -5.34447
indqsec:values indvs:values indam:values indgear:values indcarb:values
1.41212 7.94048 7.24494 3.92333 -2.05572
Note that the indVAR
is the intercept of the variable and indVAR:value
is the coefficient. eg intercept for cyl = 37.88458
while the coefficient for cyl = -2.87579
. This is the same results given by the tidyverse function