Not sure if this is a bug or my understanding is flawed, but when I run the following example:
library(caret)
data(mdrr)
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .5)]
preProc <- preProcess(mdrrDescr, c("center", "scale"))
mdrrDescr <- predict(preProc, mdrrDescr)
inTrain <- createDataPartition(mdrrClass)
trainX <- mdrrDescr[inTrain[[1]], ]
trainY <- mdrrClass[inTrain[[1]]]
testX <- mdrrDescr[-inTrain[[1]], ]
testY <- mdrrClass[-inTrain[[1]]]
library(MASS)
ldaFit <- lda(trainX, trainY)
qdaFit <- qda(trainX, trainY)
testProbs <- data.frame(obs = testY,
lda = predict(ldaFit, testX)$posterior[,1],
qda = predict(qdaFit, testX)$posterior[,1])
calPlotData <- caret::calibration(obs ~ lda + qda, data = testProbs, cuts = 5)
> calPlotData$data
I get this result:
# out:
calibModelVar bin Percent Lower Upper Count midpoint
1 lda [0,0.2] 6.521739 2.430775 13.65621 6 10
2 lda (0.2,0.4] 30.232558 20.789989 41.08301 26 30
3 lda (0.4,0.6] 59.375000 46.367688 71.48530 38 50
4 lda (0.6,0.8] 70.909091 61.481025 79.17690 78 70
5 lda (0.8,1] 85.227273 79.108431 90.11742 150 90
6 qda [0,0.2] 28.099174 22.529270 34.21445 68 10
7 qda (0.2,0.4] 40.000000 12.155226 73.76219 4 30
8 qda (0.4,0.6] 33.333333 9.924609 65.11245 4 50
9 qda (0.6,0.8] 80.000000 56.338600 94.26660 16 70
10 qda (0.8,1] 84.426230 79.256188 88.73729 206 90
However, when I do some investigating, it turns out data is being duplicated in these results. For example,
>table(testProbs$obs == "Active" & testProbs$lda <= 0.2)
# out:
FALSE TRUE
261 3
>table(testProbs$obs == "Active" & testProbs$qda <= 0.2)
# out:
FALSE TRUE
230 34
This is also affecting the error estimates (Upper and Lower in the table). For instance, when I run the calibration() function with just one of the model columns,
> calPlotData <- caret::calibration(obs ~ lda, data = testProbs, cuts = 5)
> calPlotData$data
# out:
calibModelVar bin Percent Lower Upper Count midpoint
1 lda [0,0.2] 6.521739 1.365677 17.89644 3 10
2 lda (0.2,0.4] 30.232558 17.182499 46.12533 13 30
3 lda (0.4,0.6] 59.375000 40.644925 76.30159 19 50
4 lda (0.6,0.8] 70.909091 57.101742 82.37003 39 70
5 lda (0.8,1] 85.227273 76.063784 91.89296 75 90
This was a bug and was fixed by this PR.