I'm trying to use the caret package to cross validate a model that I made. It depends on 3 variables, but the data set I used has many more than that. To reproduce a more precise example, I made variables a b c d and e, but only use a b and c to predict.
a <- rnorm(10)
b <- rnorm(10)
c <- rnorm(10)
d <- rnorm(10)
e <- rnorm(10)
y <- rnorm(10)
df <- data.frame(a,b,c,d,e,y, stringsAsFactors=FALSE)
library(caret)
model <- train(
df$y ~ df$a + df$b + df$c, x = df,
method = "lm",
trControl = trainControl(
method = "cv", number = 10,
verboseIter = TRUE,
))
This gives error: Please make sure y
is a factor or numeric value
I have tried several ways to change y but no luck. Anyone know from experience why this isn't working? I have googled for a couple hours and can't find the exact same problem.
You should either use a formula (and a data
argument) or x
and y
arguments, you're mixing both. So you can use a formula with:
model <- train(
y ~ a + b + c, data = df,
method = "lm",
trControl = trainControl(
method = "cv", number = 10,
verboseIter = TRUE,
))
(you don't need to write df$y
, df$a
, etc., because you supply the data
argument so R knows to look in that dataframe)