Search code examples
rlinear-regressionmodelingcross-validationr-caret

R caret package, Error: Please make sure `y` is a factor or numeric value


I'm trying to use the caret package to cross validate a model that I made. It depends on 3 variables, but the data set I used has many more than that. To reproduce a more precise example, I made variables a b c d and e, but only use a b and c to predict.

a <- rnorm(10)
b <- rnorm(10)
c <- rnorm(10)
d <- rnorm(10)
e <- rnorm(10)
y <- rnorm(10)
df <- data.frame(a,b,c,d,e,y, stringsAsFactors=FALSE)

library(caret)
model <- train(
df$y ~ df$a + df$b + df$c, x = df,
method = "lm",
trControl = trainControl(
method = "cv", number = 10,
verboseIter = TRUE, 
))

This gives error: Please make sure y is a factor or numeric value

I have tried several ways to change y but no luck. Anyone know from experience why this isn't working? I have googled for a couple hours and can't find the exact same problem.


Solution

  • You should either use a formula (and a data argument) or x and y arguments, you're mixing both. So you can use a formula with:

    model <- train(
        y ~ a + b + c, data = df,
        method = "lm",
        trControl = trainControl(
            method = "cv", number = 10,
            verboseIter = TRUE, 
        ))
    

    (you don't need to write df$y, df$a, etc., because you supply the data argument so R knows to look in that dataframe)