Search code examples
rauc

R function colAUC in package caTools fails with large samples


For example:

require(caTools)
colAUC(runif(90000), sample(c(0,1), 90000, replace = TRUE))
             [,1]
0 vs. 1 0.5000629

works fine, however

colAUC(runif(100000), sample(c(0,1), 100000, replace = TRUE))

gives

            [,1]
0 vs. 1   NA
Warning message:
In n1 * n2 : NAs produced by integer overflow

Am I doing something wrong, or is this perhaps a bug? ROCR::performance produces a reasonable answer for samples of this size.


Solution

  • Answering my own question. First, colAUC has parameter alg which allows options of "Wilcoxon" or "ROC". The "ROC" option computes the AUC by integrating the ROC curve using the trapezoid rule, which is what I would expect, and it does not give an error for larger samples, e.g.

    > colAUC(runif(1000000), sample(c(0,1), 1000000, replace = TRUE), alg = "ROC")
                 [,1]
    0 vs. 1 0.5004179
    

    However the default value of alg is "Wilcoxon", and this algorithm computes n1 * n2 where n1 and n2 are calculated by a table statement, and so are of type integer. This means a 32 bit integer - R apparently does not yet support 64 bit integers. So an overflow error is produced. This error could be removed if n1 and n2 were cast to numeric before multiplication. I'll email the package maintainer so he is aware of this issue.

    UPDATE: I got an email back from the package maintainer saying he has fixed this by casting n1 and n2 to doubles.