There seem to be differences between the ROC/Sens/Spec that is produced when tuning the model, to the actual predictions made by the model on the same dataset. I'm using caret which uses kernlab's ksvm. I'm not experiencing this problem with glm.
data(iris)
library(caret)
iris <- subset(iris,Species == "versicolor" | Species == "setosa") # we need only two output classess
iris$noise <- runif(nrow(iris)) # add noise - otherwise the model is too "perfect"
iris$Species <- factor(iris$Species)
fitControl <- trainControl(method = "repeatedcv",number = 10, repeats = 5, savePredictions = TRUE, classProbs = TRUE, summaryFunction = twoClassSummary)
ir <- train(Species ~ Sepal.Length + noise, data=iris,method = "svmRadial", preProc = c("center", "scale"), trControl=fitControl,metric="ROC")
confusionMatrix(predict(ir), iris$Species, positive = "setosa")
getTrainperf(ir) # same as in the model summary
What is the source of this discrepancy? which ones are the "real", post-cross-validation predictions?
It seems the function getTrainPerf
gives the mean performance results of the best tuned parameters averaged across the repeated cross validations folds.
Here is how getTrainPerf
works:
getTrainPerf(ir)
# TrainROC TrainSens TrainSpec method
#1 0.9096 0.844 0.884 svmRadial
which is achieved in the following way:
ir$results
# sigma C ROC Sens Spec ROCSD SensSD SpecSD
#1 0.7856182 0.25 0.9064 0.860 0.888 0.09306044 0.1355262 0.1222911
#2 0.7856182 0.50 0.9096 0.844 0.884 0.08882360 0.1473023 0.1218229
#3 0.7856182 1.00 0.8968 0.836 0.884 0.09146071 0.1495026 0.1218229
ir$bestTune
# sigma C
#2 0.7856182 0.5
merge(ir$results, ir$bestTune)
# sigma C ROC Sens Spec ROCSD SensSD SpecSD
#1 0.7856182 0.5 0.9096 0.844 0.884 0.0888236 0.1473023 0.1218229
which can also be obtained from the performance results on the cross validation folds (10 folds, 5 repeats, 10*5=50 total values for each of the performance measures).
colMeans(ir$resample[1:3])
# ROC Sens Spec
# 0.9096 0.8440 0.8840
Hence, getTrainPerf
only gives the summary of the cross-validation performances on the data folds held-out for validation at different times (not on the entire training dataset) with the best tuned parameters (sigma, C).
But if you want to predict on your entire training dataset, you need to use the predict
function with the tuned model.