I tried to calculate the confusion matrix after I conduct the decision tree model
# tree model
tree <- rpart(LoanStatus_B ~.,data=train, method='class')
# confusion matrix
pdata <- predict(tree, newdata = test, type = "class")
confusionMatrix(data = pdata, reference = test$LoanStatus_B, positive = "1")
How can I set the threshold to my confusion matrix, say maybe I want probability above 0.2 as default, which is the binary outcome.
Several things to note here. Firstly, make sure you're getting class probabilities when you do your predictions. With prediction type ="class"
you were just getting discrete classes, so what you wanted would've been impossible. So you'll want to make it "p"
like mine below.
library(rpart)
data(iris)
iris$Y <- ifelse(iris$Species=="setosa",1,0)
# tree model
tree <- rpart(Y ~Sepal.Width,data=iris, method='class')
# predictions
pdata <- as.data.frame(predict(tree, newdata = iris, type = "p"))
head(pdata)
# confusion matrix
table(iris$Y, pdata$`1` > .5)
Next note that .5 here is just an arbitrary value -- you can change it to whatever you want.
I don't see a reason to use the confusionMatrix
function, when a confusion matrix can be created simply this way and allows you to acheive your goal of easily changing the cutoff.
Having said that, if you do want to use the confusionMatrix
function for your confusion matrix, then just create a discrete class prediction first based on your custom cutoff like this:
pdata$my_custom_predicted_class <- ifelse(pdata$`1` > .5, 1, 0)
Where, again, .5 is your custom chosen cutoff and can be anything you want it to be.
caret::confusionMatrix(data = pdata$my_custom_predicted_class,
reference = iris$Y, positive = "1")
Confusion Matrix and Statistics Reference Prediction 0 1 0 94 19 1 6 31 Accuracy : 0.8333 95% CI : (0.7639, 0.8891) No Information Rate : 0.6667 P-Value [Acc > NIR] : 3.661e-06 Kappa : 0.5989 Mcnemar's Test P-Value : 0.0164 Sensitivity : 0.6200 Specificity : 0.9400 Pos Pred Value : 0.8378 Neg Pred Value : 0.8319 Prevalence : 0.3333 Detection Rate : 0.2067 Detection Prevalence : 0.2467 Balanced Accuracy : 0.7800 'Positive' Class : 1