Search code examples
rmodelprediction

randomForest model doesn't run


I try to run a randomForest model on iris data without the variable Petal.Length. The code give me errors on prédiction. How can I code properly? Thanks for help. Richard

data (iris)
attach (iris)
iris$id <- 1:nrow(iris)

library (dplyr)
train <- iris %>% 
  sample_frac (0.8)
test <- iris %>% 
  anti_join(train, by = "id")

library (randomForest)
library (caret)

fit <- randomForest(Species ~ 
                      Sepal.Length +Sepal.Width +Petal.Width, data = train,)


prediction <- predict (fit, test [1:2 , 4])

confusionMatrix (test$Species,prediction)

Solution

  • You subsetting for test dataset is wrong. Just use

    prediction <- predict (fit, newdata = test)
    

    in place of

    predict (fit, test [1:2 , 4])
    

    It will automatically take the required independent variables. Or you can use like

    prediction <- predict (fit, subset(test, select = -c(Petal.Length)))