Search code examples
rlinear-regressionlapplydatalistpolynomials

How to order models from best to worst based on AIC from the result of lapply in R


Suppose I have following df.

ind1 <- rnorm(100)
ind2 <- rnorm(100)
ind3 <- rnorm(100)
ind4 <- rnorm(100)
ind5 <- rnorm(100)
dep <- rnorm(100, mean=ind1)

df <- data.frame(dep,ind1, ind2, ind3, ind4, ind5)

I calculated 3rd order polynomial regression between dep variable and each of the ind variables separately using lapply() as follow:

polys <- lapply(df[,-1], function(x) summary(lm(dep ~ poly(x, 3, raw = 
TRUE), data = df)))

Now I want to list/order best models based on AIC. I tried these without success.

stepAIC(polys)
polys$AIC

Do you have any idea how I should do this in R?

Thanks in advance!


Solution

  • Currently, you are conflating list objects with methods stepAIC and AIC that run on specific components. Specifically,

    1. The function stepAIC in MASS package runs on individual model objects (e.g., lm class types), not summary of model objects and not list of model objects (i.e., polys) with your failed attempt: stepAIC(polys).

    2. The function AIC in MASS package runs on individual model objects usually after the stepAIC call and is not an item of the return with your failed attempt: polys$AIC.

    Consider the following re-factoring of your code with seeded data for reproducibility to sort list of lm objects, polys, by ascending AIC values:

    library(MASS)
    
    set.seed(50619)    
    #... same data build code ...
    
    # LIST OF lm OBJECTS (NO SUMMARY)
    polys <- lapply(df[,-1], function(x) lm(dep ~ poly(x, 3, raw = TRUE), data = df))
    
    # SORTED VECTOR OF AIC VALUES (USING sapply)
    AIC_vec <- sort(sapply(polys, function(x) AIC(stepAIC(x))))
    AIC_vec
    #     ind1     ind3     ind2     ind4     ind5 
    # 297.1865 352.3694 352.8260 352.8260 352.8260
    
    # SORTED polys BY AIC (BEST TO WORST)
    sorted_polys <- polys[names(AIC_vec)]