Search code examples
rggplot2ggpmisc

Is it possible to extract slope and intercept from multiple fitted lines into a tibble?


I'm trying to compare the slope and intercept of many separate fitted lines and would like to extract this information from the equations that are shown using stat_poly_eq. I am able to plot all of the data and lines but since in my actual data I has over 50 equations, I'd like a simple way to extract the slope and intercept of each line.

Below is code to generate a similar plot with mtcars.

I'd like to add an output as a tibble with columns for cyl, gear, slope, intercept.

library(tidyverse)
library(ggpmisc)

ggplot(mtcars, aes(x = wt, y = mpg, color = as.character(cyl))) +
  geom_point()+
  facet_wrap(gear ~ .) +
  stat_poly_line(fullrange = TRUE, se = FALSE) +
  stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~~")), 
               parse=TRUE,label.x.npc = "right")


Solution

  • You can use nlme::lmList to help run all regression.

    library(nlme)
    
    #create a new grouping variable for the nested grouping factor
    mtcars2 <- mtcars %>% mutate(gear_cyl = interaction(mtcars$gear, mtcars$cyl))
    
    fm1 <- lmList(mpg ~ wt | gear_cyl, mtcars2)
    

    Calling the summary for coefficient and r-squared will return you the same statistics shown on your ggplot.

    summary(fm1)$coef
    
    , , (Intercept)
    
        Estimate Std. Error  t value     Pr(>|t|)
    3.4 21.50000        NaN      NaN          NaN
    4.4 40.85910   4.254923 9.602782 4.813403e-08
    5.4 41.01754        NaN      NaN          NaN
    3.6 64.70408        NaN      NaN          NaN
    4.6 30.20964  12.042632 2.508558 2.326993e-02
    5.6 19.70000        NaN      NaN          NaN
    3.8 25.05942   4.527584 5.534834 4.526681e-05
    5.8 22.14000        NaN      NaN          NaN
    
    , , wt
    
          Estimate Std. Error    t value    Pr(>|t|)
    3.4         NA        NaN         NA          NA
    4.4  -5.859279   1.741259 -3.3649675 0.003940916
    5.4  -7.017544        NaN        NaN         NaN
    3.6 -13.469388        NaN        NaN         NaN
    4.6  -3.380894   3.866794 -0.8743402 0.394867930
    5.6         NA        NaN         NA          NA
    3.8  -2.438894   1.085886 -2.2459951 0.039178704
    5.8  -2.000000        NaN        NaN         NaN
    
    summary(fm1)$r.squared
    
    [1] 0.0000000 0.5358963 1.0000000 1.0000000 0.8095674 0.0000000 0.4561613 1.0000000
    
    

    To put everything into a tibble:

    output <- tibble(gear_cyl = names(fm1),
                     intercept = summary(fm1)$coef[,1,1],
                     slope = summary(fm1)$coef[,1,2],
                     r_sq = summary(fm1)$r.squared)
    
    output %>% separate(gear_cyl, c("gear", "cyl"), sep="\\.")
    
    # A tibble: 8 x 5
      gear  cyl   intercept  slope  r_sq
      <chr> <chr>     <dbl>  <dbl> <dbl>
    1 3     4          21.5  NA    0    
    2 4     4          40.9  -5.86 0.536
    3 5     4          41.0  -7.02 1    
    4 3     6          64.7 -13.5  1    
    5 4     6          30.2  -3.38 0.810
    6 5     6          19.7  NA    0    
    7 3     8          25.1  -2.44 0.456
    8 5     8          22.1  -2.00 1