Search code examples
machine-learningclassificationauc

In Classification, what is the difference between the test accuracy and the AUC score?


I am working on a classification-based project, and I am evaluating different ML models based on their training accuracy, testing accuracy, confusion matrix, and the AUC score. I am now stuck in understanding the difference between the scores I get by calculating accuracy of a ML model on the test set (X_test), and the AUC score.

If I am correct, both metrics calculate how well a ML model is able to predict the correct class of previously unseen data. I also understand that for both, the higher the number, the better, for as long as the model is not over-fit or under-fit.

Assuming a ML model is neither over-fit nor under-fit, what is the difference between test accuracy score and the AUC score?

I don't have a background in math and stats, and pivoted towards data science from business background. Therefore, I will appreciate an explanation a business person can understand.


Solution

  • Both terms quantify the quality of a classification model, however, the accuracy quantifies a single manifestation of the variables, which means it describes a single confusion matrix. The AUC (area under the curve) represents the trade-off between the true-positive-rate (tpr) and the false-positive-rate (fpr) in multiple confusion matrices, that are generated for different fpr values for the same classifier. A confusion matrix is of the form:

    enter image description here

    1) The accuracy is a measure for a single confusion matrix and is defined as: accuracy =  (TP+TN)/(TP+FP+TN+FN)

    where tp=true-positives, tn=true-negatives, fp=false-positives and fn=false-negatives (the amount of each).

    2) The AUC measures the area under the ROC (receiver operating characteristic), that is the trade-off curve between the true-positive-rate and the false-positive-rate. For each choice of the false-positive-rate (fpr) threshold,the true-positive-rate (tpr) is determined. I.e, for a given classifier a fpr of 0, 0.1, 0.2 and so fourth is accepted, and for each fpr it's dependent tpr is evaluated. Therefore, you get a function tpr(fpr) that maps the interval [0,1] onto the same interval, because both rates are defined in those intervals. The area under this line is called the AUC, that is between 0 and 1, whereby a random classification is expected to yield an AUC of 0.5.

    enter image description here enter image description here

    The AUC, as it is the area under the curve, is defined as:

    enter image description here

    However, in real (and finite) applications, the ROC is a step function and the AUC is determined by a weighted sum these levels.

    Graphics are from Borgelt's Intelligent Data Mining Lecture.