I want to calculate two confusion matrix for my logistic regression using my training data and my testing data:
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
i set the threshold of predicted probability at 0.5:
confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
train$LoanStatus_B == 1))
And the the code below works well for my training set. However, when i use the test set:
confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
test$LoanStatus_B == 1))
it gave me an error of
Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length
Why is this? How can I fix this? Thank you!
I think there is a problem with the use of predict, since you forgot to provide the new data. Also, you can use the function confusionMatrix
from the caret
package to compute and display confusion matrices, but you don't need to table your results before that call.
Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did.
train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
Now, you can predict the data (for example, your training set) and then use confusionMatrix()
that takes two arguments:
library(caret)
# Use your model to make predictions, in this example newdata = training set, but replace with your test set
pdata <- predict(logitMod, newdata = train, type = "response")
# use caret and compute a confusion matrix
confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)
Here are the results
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 66 33
1 0 1
Accuracy : 0.67
95% CI : (0.5688, 0.7608)
No Information Rate : 0.66
P-Value [Acc > NIR] : 0.4625