I'm using the the package randomForest in R to create a model to classify cases into disease (1) or disease free (0):
classify_BV_100t <- randomForest(bv.disease~., data=RF_input_BV_clean, ntree = 100, localImp = TRUE)
randomForest(formula = bv.disease ~ ., data = RF_input_BV_clean, ntree = 100, localImp = TRUE)
Type of random forest: classification
Number of trees: 100
No. of variables tried at each split: 53
OOB estimate of error rate: 8.04%
Confusion matrix:
0 1 class.error
0 510 7 0.01353965
1 39 16 0.70909091
My confusion matrix shows that the model is good at classifying 0 (no disease), but is very bad as classifying 1 (disease).
But when I plot ROC plots it gives the impression that the model is pretty good.
Here are the 2 different ways I plotted ROC:
rf.roc<-roc(RF_input_BV_clean$bv.disease, classify_BV_100t$votes[,2])
(Using How to compute ROC and AUC under ROC after training using caret in R?)
predictions <- as.vector(classify_BV_100t$votes[,2])
pred <- prediction(predictions, RF_input_BV_clean$bv.disease)
perf_AUC <- performance(pred,"auc") #Calculate the AUC value
AUC <- perf_AUC@y.values[[1]]
perf_ROC <- performance(pred,"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))
These are the ROC plots from 1 and 2:
Both methods give me an AUC of 0.8621593.
Does anyone know why the results from the random forest confusion matrix don't seem to add up with the ROC/AUC?
I don't believe that there is anything wrong with your ROC plots and your assessment of the discrepancy is right on.
The high AUC values are a product of there being a very high true negative rate. The ROC takes into account sensitivity; largely a measure of of the true positive values and specificity; a measure of the true negative values. Because your specificity is very high that metric is effectively carrying the lower sensitivity value of the model and this keeps your AUC relatively high. Yes, its a high AUC but as you mentioned, the model is only good at predicting negatives.
I'd recommend calculating additional metrics (sensitivity, specificity, true positive rate, false positive rate... ) and evaluating the combination of all those metrics as you assess your model. AUC is a quality metric, but it means a lot more with additional metrics behind it.