Search code examples
rvalidationlogistic-regressionconfusion-matrix

confusionMatrix for logistic regression in R


I want to calculate two confusion matrix for my logistic regression using my training data and my testing data:

logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))

i set the threshold of predicted probability at 0.5:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      train$LoanStatus_B == 1))

And the the code below works well for my training set. However, when i use the test set:

confusionMatrix(table(predict(logitMod, type="response") >= 0.5,
                      test$LoanStatus_B == 1))

it gave me an error of

Error in table(predict(logitMod, type = "response") >= 0.5, test$LoanStatus_B == : all arguments must have the same length

Why is this? How can I fix this? Thank you!


Solution

  • I think there is a problem with the use of predict, since you forgot to provide the new data. Also, you can use the function confusionMatrix from the caret package to compute and display confusion matrices, but you don't need to table your results before that call.

    Here, I created a toy dataset that includes a representative binary target variable and then I trained a model similar to what you did.

    train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5), b= rnorm(100), c = rnorm(100), d = rnorm(100))
    logitMod <- glm(LoanStatus_B ~ ., data=train, family=binomial(link="logit"))
    

    Now, you can predict the data (for example, your training set) and then use confusionMatrix() that takes two arguments:

    • your predictions
    • the observed classes

    library(caret)
    # Use your model to make predictions, in this example newdata = training set, but replace with your test set    
    pdata <- predict(logitMod, newdata = train, type = "response")
    
    # use caret and compute a confusion matrix
    confusionMatrix(data = as.numeric(pdata>0.5), reference = train$LoanStatus_B)
    

    Here are the results

    Confusion Matrix and Statistics
    
              Reference
    Prediction  0  1
             0 66 33
             1  0  1
    
                   Accuracy : 0.67            
                     95% CI : (0.5688, 0.7608)
        No Information Rate : 0.66            
        P-Value [Acc > NIR] : 0.4625