I try to run a randomForest
model on iris
data without the variable Petal.Length
. The code give me errors on prédiction. How can I code properly? Thanks for help.
Richard
data (iris)
attach (iris)
iris$id <- 1:nrow(iris)
library (dplyr)
train <- iris %>%
sample_frac (0.8)
test <- iris %>%
anti_join(train, by = "id")
library (randomForest)
library (caret)
fit <- randomForest(Species ~
Sepal.Length +Sepal.Width +Petal.Width, data = train,)
prediction <- predict (fit, test [1:2 , 4])
confusionMatrix (test$Species,prediction)
You subsetting for test dataset is wrong. Just use
prediction <- predict (fit, newdata = test)
in place of
predict (fit, test [1:2 , 4])
It will automatically take the required independent variables. Or you can use like
prediction <- predict (fit, subset(test, select = -c(Petal.Length)))