Search code examples
rclassificationpredictionr-caret

How to predict new data set using trained classifier in R?


I would like to use a trained classifier to predict variables (iris Species) how it is possible in R? For simplicity, I generated an artificial iris_unknown set that does not contain the Species variable. I would like the classifier to predict a Species variable in iris_unknown.

library(caret)

trainIndex <- caret::createDataPartition(iris$Species, p = 0.5, list = FALSE)
irisTrain <- iris[ trainIndex,]
iris_unknown  <- iris[-trainIndex,][,-5] #delete last column species to make unknown variable
model_nnet <- train(irisTrain, irisTrain$Species, method = 'nnet', importance = TRUE)

pred_annFit <- predict(model_nnet, newdata = iris_unknown)

I got error:

Error: 'eval(predvars, data, env)': object 'Species' not found

Solution

  • You provided the column Species while training the model, and you should not do that because that will be used. it doesn't matter if you include the label in the test data.frame, because if your model is trained with, it will not use that column. So something like this:

    iris_unknown  <- iris[-trainIndex,-5]
    model_nnet <- train(irisTrain[,-5], irisTrain$Species, method = 'nnet', importance = TRUE)
    
    pred_annFit <- predict(model_nnet, newdata = iris_unknown)
    

    This is an incredible vector which you can put into your dataframe:

    str(pred_annFit)
     Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    

    Lets put the vector prediction back in:

    iris_unknown$prediction = pred_annFit