Search code examples
rr-caretensemble-learning

How to use trained caret object to predict on new data (not used while training)?


I am using caret package to train random forest model on training dataset. I have used 10-fold cross validation to get an object say randomForestFit. Now I would like to use this object to predict on new data set say test_data. I also want to get the respective class probabilities. How would I do that?

I have been using extractProb function as follows :

extractProb(randomForestFit, textX = test_data_predictors, testY = test_data_labels)

But it's giving me unexpected results.


Solution

  • From the extractProb help page example, you need to wrap the model in a list:

    knnFit <- train(Species ~ ., data = iris, method = "knn", 
                    trControl = trainControl(method = "cv"))
    
    rdaFit <- train(Species ~ ., data = iris, method = "rda", 
                    trControl = trainControl(method = "cv"))
    
    predict(knnFit)
    predict(knnFit, type = "prob")
    
    bothModels <- list(knn = knnFit,
                       tree = rdaFit)
    
    predict(bothModels)
    
    extractPrediction(bothModels, testX = iris[1:10, -5])
    extractProb(bothModels, testX = iris[1:10, -5])
    

    So the following should work:

    extractProb(list(randomForestFit), textX = test_data_predictors, testY = test_data_labels)
    

    edit:

    And yes, the preprocessing will be used. From the documentation:

    These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.