I am using caret
package to train random forest model on training dataset. I have used 10-fold cross validation to get an object say randomForestFit
. Now I would like to use this object to predict on new data set say test_data
. I also want to get the respective class probabilities. How would I do that?
I have been using extractProb
function as follows :
extractProb(randomForestFit, textX = test_data_predictors, testY = test_data_labels)
But it's giving me unexpected results.
From the extractProb
help page example, you need to wrap the model in a list:
knnFit <- train(Species ~ ., data = iris, method = "knn",
trControl = trainControl(method = "cv"))
rdaFit <- train(Species ~ ., data = iris, method = "rda",
trControl = trainControl(method = "cv"))
predict(knnFit)
predict(knnFit, type = "prob")
bothModels <- list(knn = knnFit,
tree = rdaFit)
predict(bothModels)
extractPrediction(bothModels, testX = iris[1:10, -5])
extractProb(bothModels, testX = iris[1:10, -5])
So the following should work:
extractProb(list(randomForestFit), textX = test_data_predictors, testY = test_data_labels)
edit:
And yes, the preprocessing will be used. From the documentation:
These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.