Search code examples
rr-caret

using poly( function within training model in caret package resulting in dataframe error


Here's the code that I am running:

library(caret)
library(ISLR)
data('Auto')
cverror <- c()
for(i in 1:5){
  train_control <- trainControl(method='LOOCV')
  models <- train(mpg~poly(horsepower,i), data = Auto, trControl=train_control, method='glm')
  cverror[i] <- (models$results$RMSE)^2
}

cverror

What I'm trying to accomplish is to calculate the MSE (mean square error) for different polynomial levels using a loop so I don't have to code line by line. The error message I receive is:

 Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  undefined columns selected 

Why am I receiving this error? My study mates said that this code runs fine on their machine, but it doesn't work on my personal or work computers. I have the latest RStudio and R versions installed with all packages up to date.

The following line of code works just fine:

train(mpg~poly(horsepower,2), data = Auto, trControl=train_control, method='glm')

This is in relation to the tutorial found on page 192/193 of the ISLR text (which I'm expanding upon).


Solution

  • If i <- 2 its value is not substituted into the formula. The formula is literally mpg ~ poly(horsepower, i).

    Try this:

    library(caret)
    library(ISLR)
    data('Auto')
    cverror <- numeric(5)
    for(i in 1:5){
      train_control <- trainControl(method='LOOCV')
      f <- bquote(mpg ~ poly(horsepower, .(i)))
      models <- train(as.formula(f), data = Auto, trControl=train_control, method='glm')
      cverror[i] <- (models$results$RMSE)^2
    }
    
    cverror
    #[1] 24.23151 19.24821 19.33498 19.42443 19.03321
    

    PS: Higher degree polynomials pretty much guarantee over-fitting. I'd not recommend a polynomial of a degree higher than 2, maybe 3. There are usually better models available in such a case. Higher degree polynomials are rare for "natural" processes.