Search code examples
rpredictconfusion-matrix

`table` not showing in matrix format


I'm trying to generate a confusion table using the HMDA data from the AER package. So I ran a probit model, predict on testing set, and use table() function to generate a 2 by 2 plot, but R just returns me a long list, not showing the 2 by 2 matrix that I wanted.

Could anyone tell me what's going on>

# load required packages and data (HMDA)
library(e1071)
library(caret)
library(AER)
library(plotROC)
data(HMDA)

# again, check variable columns
names(HMDA)

# convert dependent variables to numeric
HMDA$deny <- ifelse(HMDA$deny == "yes", 1, 0)

# subset needed columns
subset <- c("deny", "hirat", "lvrat", "mhist", "unemp")

# subset data
data <- HMDA[complete.cases(HMDA), subset]

# do a 75-25 train-test split
train_row_numbers <- createDataPartition(data$deny, p=0.75, list=FALSE)
training <- data[train_row_numbers, ]
testing <- data[-train_row_numbers, ]


# fit a probit model and predict on testing data
probit.fit <- glm(deny ~ ., family = binomial(link = "probit"), data = training)
probit.pred <- predict(probit.fit, testing)

confmat_probit <- table(Predicted = probit.pred, 
               Actual = testing$deny)
confmat_probit


Solution

  • You need to specify the threshold or cut-point for predicting a dichotomous outcome. Predict returns the predicted values, not 0 / 1.

    And be careful with the predict function as the default type is "link", which in your case is the "probit". If you want predict to return the probabilities, specify type="response".

    probit.pred <- predict(probit.fit, testing, type="response")
    

    Then choose a cut-point; any prediction above this value will be TRUE:

    confmat_probit <- table(`Predicted>0.1` = probit.pred > 0.1 , Actual = testing$deny)
    confmat_probit
    

                 Actual
    Predicted>0.1   0   1
            FALSE 248  21
            TRUE  273  53