For example:
require(caTools)
colAUC(runif(90000), sample(c(0,1), 90000, replace = TRUE))
[,1]
0 vs. 1 0.5000629
works fine, however
colAUC(runif(100000), sample(c(0,1), 100000, replace = TRUE))
gives
[,1]
0 vs. 1 NA
Warning message:
In n1 * n2 : NAs produced by integer overflow
Am I doing something wrong, or is this perhaps a bug? ROCR::performance produces a reasonable answer for samples of this size.
Answering my own question. First, colAUC has parameter alg
which allows options of "Wilcoxon"
or "ROC"
. The "ROC"
option computes the AUC by integrating the ROC curve using the trapezoid rule, which is what I would expect, and it does not give an error for larger samples, e.g.
> colAUC(runif(1000000), sample(c(0,1), 1000000, replace = TRUE), alg = "ROC")
[,1]
0 vs. 1 0.5004179
However the default value of alg
is "Wilcoxon"
, and this algorithm computes n1 * n2
where n1
and n2
are calculated by a table
statement, and so are of type integer. This means a 32 bit integer - R apparently does not yet support 64 bit integers. So an overflow error is produced. This error could be removed if n1 and n2 were cast to numeric before multiplication. I'll email the package maintainer so he is aware of this issue.
UPDATE: I got an email back from the package maintainer saying he has fixed this by casting n1 and n2 to doubles.