Search code examples
rmachine-learningdata-miningauc

Calculate AUC in R?


Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

Page 9 of "AUC: a Better Measure..." seems to require knowing the class labels, and here is an example in MATLAB where I don't understand

R(Actual == 1))

Because R (not to be confused with the R language) is defined a vector but used as a function?


Solution

  • As mentioned by others, you can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.

    You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.

    For example, if pos.scores is a vector containing a score of the positive examples, and neg.scores is a vector containing the negative examples then the AUC is approximated by:

    > mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T))
    [1] 0.7261
    

    will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:

    > aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))