Search code examples
rtidyverseregressionlm

extract coefficients from more than one linear regression model with multiple terms in r


I am trying to generate a data frame of intercepts and coefficients for multiple different linear regression models generated from subsets of a data set. I cannot share my data unfortunately but I can explain it using the mtcars set. I am creating a regression model predicting mpg from cyl, hp, and wt for each value of carb. After digging around for a while, I found many examples that do what I want but only for a model with 1 predictor (like mpg~wt). It all falls apart when I add the other terms. This is what I have based my work so far on: https://community.rstudio.com/t/extract-slopes-by-group-broom-dplyr/2751/8 Efficient way to extract coefficients from many linear regression lines

This is what I have tried

library(tidyverse);library(broom)
df <- mtcars
tryme <- df %>%
  split(.$carb)%>%
  map(~lm(mpg~cyl+hp+wt,data=.x)) %>%
  map_df(tidy)

with this result

    term            estimate    std.error   statistic   p.value
1   (Intercept) 46.034633   7.68430306  5.99073626  9.31E-03
2   cyl         2.650503624 4.14371413  0.63964442  5.68E-01
3   hp          -0.230007961    0.1573354   -1.46189576 2.40E-01
4   wt          -5.231999683    9.23136027  -0.56676368 6.11E-01
5   (Intercept) 39.84451509 2.95537984  13.48202845 1.03E-05
6   cyl         -0.846094229    0.93995084  -0.90014732 4.03E-01
7   hp          -0.007452737    0.03998485  -0.18638901 8.58E-01
8   wt          -4.133340298    1.42757472  -2.89535829 2.75E-02
9   (Intercept) 17.50267062 22.13706712 0.79064993  5.74E-01
10  cyl         NA          NA          NA          NA
11  hp          NA          NA          NA          NA
12  wt          -0.3115727  5.73067255  -0.05436931 9.65E-01
13  (Intercept) 45.33390978 12.93999647 3.50339429  1.28E-02
14  cyl         -4.195214198    3.492613    -1.20116778 2.75E-01
15  hp          0.029361878 0.04927895  0.59583008  5.73E-01
16  wt          -1.239041102    1.03937377  -1.19210349 2.78E-01
17  (Intercept) 19.7            NaN         NaN         NaN
18  cyl         NA          NA          NA          NA
19  hp          NA          NA          NA          NA
20  wt          NA          NA          NA          NA
21  (Intercept) 15          NaN         NaN         NaN
22  cyl         NA          NA          NA          NA
23  hp          NA          NA          NA          NA
24  wt          NA          NA          NA          NA

what I want is a table that looks like this:

carb    intercept   cyl         hp          wt
1   46.034633   2.650503624 -0.230007961    -5.231999683
2   39.84451509 -0.846094229    -0.007452737    -4.133340298
3   17.50267062 NA          NA          -0.3115727
4   45.33390978 -4.195214198    0.029361878 -1.239041102
6   19.7            NA          NA          NA
8   15          NA          NA          NA

I don't know how to bring the value of the grouping variable in. If I can get that added to what I already have I know how to transpose the data into the form I need.

I am updating a bit, as an answer here gave me what I wanted, but is causing an error that I can't seem to fix. the solution uses lmList from the nlme library. The code I am using is this

fit <- lmList(SentLength~Unified_UPPER+Unified_LOWER+GRADE | SentType, data=df, na.omit)
and this is the data I am running it on:
SentType    SentLength  Unified_UPPER   Unified_LOWER   GRADE
Jail        0.06578947  0.06666667      0.06666667      3
Other       0           6               0               4
Other       0           6               6               1
Probation   6           0               0               1
Other       0           9               0               6
Jail        1.41447368  6               0               3
Probation   6           0               0               1
Probation   6           0               0               1
Probation   12          0               0               2
Jail        6           16              6               6
Prison      36          27              21              5
Probation   24          9               0               6
Jail        0.23026316  0.5             0               1
Jail        0.09868421  0.06666667      0.06666667      1
Probation   60          27              21              6
Probation   6           1               0               1
Probation   24          0               0               3
Prison      36          54              36              8
Probation   24          9               0               6
Probation   6           0               0               1
Probation   0           0.06666667      0.06666667      1
Probation   24          0               0               3
Jail        0.13157895  1               0.06666667      1
Jail        0.09868421  1               0.1             1
Prison      42          60              48              8
Probation   6           0               0               2
Other       0           0               0               1
Prison      6           6               6               1
Prison      6           6               6               1
Other       0           16              9               7

I get this error: Error in na.fail.default(data) : missing values in object despite no missing values in the data. Can anyone help with this?


Solution

  • It always amazes me that people insist on using tidyverse mumbo-jumbo for this when it is so incredibly easy with package nlme, which is a "recommended package", meaning it should be part of your R installation and you don't even need to install it.

    library(nlme)
    fit <- lmList(mpg ~ cyl + hp + wt | carb, data = mtcars)
    coef(fit)
    #  (Intercept)        cyl           hp         wt
    #1    46.03463  2.6505036 -0.230007961 -5.2319997
    #2    39.84452 -0.8460942 -0.007452737 -4.1333403
    #3    17.50267         NA           NA -0.3115727
    #4    45.33391 -4.1952142  0.029361878 -1.2390411
    #6    19.70000         NA           NA         NA
    #8    15.00000         NA           NA         NA
    

    Note that the row names are the carb values.

    There is also a summary method producing convenient output. However, read help("summary.lmList") regarding the pool parameter if you use it.