Search code examples
rlapplymodel-fitting

Iterate over list of models and compare model fit using AIC, BIC


I have dataset with multiple outcome variables that I would like to test against one predictor. on exploratory analysis I noted some of the relationships are polynomial to the degree 2 rather than linear. I would like to look at the BIC and AIC to make my decision of which is the best model to run.

I have lapply function where I can iterate over multiple outcome variables but now I would like to add a second model and compare their fit. However when I run this function, it only saves the second model and I dont know how to get to 'outputs' to run through the next function. Can I have this within one function or do I need two?

Here is a example from the iris dataset

data(iris)
vars <- names(iris[2:4])

models2 <- lapply(vars, function(x) {
  model_list=list(
    mod1=lm(substitute(i ~ Sepal.Length, list(i=as.name(x))), data=iris),
    mod2=lm(substitute(i ~ poly(Sepal.Length,2), list(i=as.name(x))), data=iris))
})

y <- lapply(models2, summary) #This only saves results from mod2

How do I then compare mod1 to mod2 fit and extract the following variable's?

data.frame(
  do.call(merge, list(BIC(mod1, mod2), AIC(mod1, mod2))), 
  logLik=sapply(list(mod1, mod2), logLik), 
  anova(mod1, mod2, test='Chisq'))

Solution

  • First, make the lm nicer using do.call and reformulate. Then lapply over the models like this:

    > models2 <- lapply(setNames(vars, vars), function(x) {
    +   list(
    +     mod1=do.call('lm', list(reformulate('Sepal.Length', x), quote(iris))),
    +     mod2=do.call('lm', list(reformulate('poly(Sepal.Length, 2)', x), quote(iris)))
    +   )
    + })
    > 
    > (res <- lapply(models2, \(x) data.frame(
    +   with(x, do.call('merge', list(BIC(mod1, mod2), AIC(mod1, mod2)))),
    +   logLik=with(x, sapply(list(mod1, mod2), logLik)),
    +   with(x, anova(mod1, mod2))
    + )))
    $Sepal.Width
      df      BIC      AIC    logLik Res.Df      RSS Df Sum.of.Sq        F     Pr..F.
    1  3 188.4963 179.4644 -86.73221    148 27.91566 NA        NA       NA         NA
    2  4 189.8107 177.7682 -84.88410    147 27.23618  1 0.6794752 3.667285 0.05743267
    
    $Petal.Length
      df      BIC      AIC    logLik Res.Df      RSS Df Sum.of.Sq        F      Pr..F.
    1  3 396.1669 387.1350 -190.5675    148 111.4592 NA        NA       NA          NA
    2  4 389.8649 377.8223 -184.9112    147 103.3623  1  8.096848 11.51519 0.000887343
    
    $Petal.Width
      df      BIC      AIC    logLik Res.Df      RSS Df Sum.of.Sq        F      Pr..F.
    1  3 192.4030 183.3711 -88.68553    148 28.65225 NA        NA       NA          NA
    2  4 182.3757 170.3331 -81.16656    147 25.91907  1  2.733179 15.50122 0.000126881
    

    You could additionally rbind.

    > do.call('rbind', res)
                   df      BIC      AIC     logLik Res.Df       RSS Df Sum.of.Sq         F      Pr..F.
    Sepal.Width.1   3 188.4963 179.4644  -86.73221    148  27.91566 NA        NA        NA          NA
    Sepal.Width.2   4 189.8107 177.7682  -84.88410    147  27.23618  1 0.6794752  3.667285 0.057432671
    Petal.Length.1  3 396.1669 387.1350 -190.56750    148 111.45916 NA        NA        NA          NA
    Petal.Length.2  4 389.8649 377.8223 -184.91116    147 103.36231  1 8.0968484 11.515191 0.000887343
    Petal.Width.1   3 192.4030 183.3711  -88.68553    148  28.65225 NA        NA        NA          NA
    Petal.Width.2   4 182.3757 170.3331  -81.16656    147  25.91907  1 2.7331790 15.501223 0.000126881
    

    Data:

    > data(iris)
    > vars <- names(iris[2:4])