Search code examples
rpurrrsummarytidybroom

r filter and aggregate results from lapply model summary


I am trying to filter and aggregate results from multiple regression models executed on a subset of dataset using dlply.

This is how I ran my models:

library(plyr)

data("mtcars")

models = dlply(mtcars, .(cyl), function(df) lm(mpg ~ hp,data=df))
lapply(models, summary)

Right now I am combining the results from different models(cylinder 4, 6, 8) like this:

rbind(
  c("Cylinder 4", coef(lapply(models, summary)$`4`)[2,]),
  c("Cylinder 6", coef(lapply(models, summary)$`6`)[2,]),
  c("Cylinder 8", coef(lapply(models, summary)$`8`)[2,])
)

Is there a way to summarize this more efficiently?


Solution

  • We can use tidy from broom, rather than using summary and coef. We can also just pipe the model data straight into map2_df.

    library(tidyverse)
    
    dlply(mtcars, .(cyl), function(df)
      lm(mpg ~ hp, data = df)) %>%
      map2_df(
        .,
        names(.),
        ~ tidy(.x)[2,] %>% mutate(Cylinder = paste0("Cylinder ", .y)) %>% tibble::column_to_rownames("Cylinder")
      )
    

    Output

      term  estimate std.error statistic p.value Cylinder
      <chr>    <dbl>     <dbl>     <dbl>   <dbl> <chr>   
    1 hp    -0.113      0.0612    -1.84   0.0984 4       
    2 hp    -0.00761    0.0266    -0.286  0.786  6       
    3 hp    -0.0142     0.0139    -1.02   0.326  8