Search code examples
rdplyrautomationdata-wrangling

How to incrementally add multiple model outputs to a DF in an automated way?


For exploratory purposes, I'm running 156 different regression models. For each model, I extract AIC, BIC, logLik and .

Copying each metric for each model (624x CtrlC/V) and pasting to an Excel blank file simply feels very dumb - because it is.

QUESTION: Does anyone have a tip on how to code a workaround for this time wasting work? Maybe elucidating how I could sequentially add it to a new data frame.

Code example:

a <- lmer(y ~ covariate¹ + ... + covariateⁿ + Time + (1 + Time | ID))
AIC(a)
BIC(a)
logLik(a)
r.squaredGLMM(a)

Thanks. Cheers

EDIT: I mean, I know identifying them might be a concern (like some sorts of model in different columns), but in this case whatever. If I got a DF of 156 lines x 4 columns (4 metrics of interest per model) I would already be happy.


Solution

  • Titorelli you probably solved this now, but this might help others in future.

    I realise answers that just reference links are not great for SO, but I suggest looking at Hadley's R for Data Science book, Chapter 25 Many Models. Hadley explictly looks at dealing with many models using list-columns and also references broom (like deschen did above):

    In this chapter you’re going to learn three powerful ideas that help you to work with large numbers of models with ease:

    1 Using many simple models to better understand complex datasets.

    2 Using list-columns to store arbitrary data structures in a data frame. For example, this will allow you to have a column that contains linear models.

    3 Using the broom package, by David Robinson, to turn models into tidy data. This is a powerful technique for working with large numbers of models because once you have tidy data, you can apply all of the techniques that you’ve learned about earlier in the book.

    I don't want to replicate his book, the examples are well laid out, but look at 25.2.4 Model Quality which uses purr to map models and broom to get to what you want.(Using gapminder as example.)

    by_country %>% 
      mutate(glance = map(model, broom::glance)) %>% 
      unnest(glance)
    
    #> # A tibble: 142 x 17 
    #> # Groups:   country, continent [142]   
    #>   country continent data  model resids r.squared adj.r.squared sigma statistic   
    #>   <fct>   <fct>     <lis> <lis> <list>     <dbl>         <dbl> <dbl>     <dbl>
    #> 1 Afghan… Asia      <tib… <lm>  <tibb…     0.948         0.942 1.22      181.    
    #> 2 Albania Europe    <tib… <lm>  <tibb…     0.911         0.902 1.98      102.    
    #> 3 Algeria Africa    <tib… <lm>  <tibb…     0.985         0.984 1.32      662.    
    #> # … with 136 more rows, and 8 more variables: p.value <dbl>, df <dbl>,  
    #> #   logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>, nobs <int>