I am working on a classification-based project, and I am evaluating different ML models based on their training accuracy, testing accuracy, confusion matrix, and the AUC score. I am now stuck in understanding the difference between the scores I get by calculating accuracy of a ML model on the test set (X_test), and the AUC score.
If I am correct, both metrics calculate how well a ML model is able to predict the correct class of previously unseen data. I also understand that for both, the higher the number, the better, for as long as the model is not over-fit or under-fit.
Assuming a ML model is neither over-fit nor under-fit, what is the difference between test accuracy score and the AUC score?
I don't have a background in math and stats, and pivoted towards data science from business background. Therefore, I will appreciate an explanation a business person can understand.
Both terms quantify the quality of a classification model, however, the accuracy quantifies a single manifestation of the variables, which means it describes a single confusion matrix. The AUC (area under the curve)
represents the trade-off between the true-positive-rate (tpr)
and the false-positive-rate (fpr)
in multiple confusion matrices, that are generated for different fpr
values for the same classifier.
A confusion matrix is of the form:
1) The accuracy is a measure for a single confusion matrix and is defined as:
where tp=true-positives, tn=true-negatives, fp=false-positives and fn=false-negatives (the amount of each).
2) The AUC
measures the area under the ROC (receiver operating characteristic)
, that is the trade-off
curve between the true-positive-rate
and the false-positive-rate
. For each choice of the false-positive-rate (fpr) threshold,the true-positive-rate (tpr) is determined. I.e, for a given classifier a fpr of 0, 0.1, 0.2 and so fourth is accepted, and for each fpr it's dependent tpr is evaluated. Therefore, you get a function tpr(fpr) that maps the interval [0,1] onto the same interval, because both rates are defined in those intervals. The area under this line is called the AUC, that is between 0 and 1, whereby a random classification is expected to yield an AUC of 0.5.
The AUC, as it is the area under the curve, is defined as:
However, in real (and finite) applications, the ROC
is a step function and the AUC is determined by a weighted sum these levels.
Graphics are from Borgelt's Intelligent Data Mining Lecture.