I'm trying to generate a confusion table using the HMDA
data from the AER
package. So I ran a probit model, predict on testing set, and use table()
function to generate a 2 by 2 plot, but R just returns me a long list, not showing the 2 by 2 matrix that I wanted.
Could anyone tell me what's going on>
# load required packages and data (HMDA)
library(e1071)
library(caret)
library(AER)
library(plotROC)
data(HMDA)
# again, check variable columns
names(HMDA)
# convert dependent variables to numeric
HMDA$deny <- ifelse(HMDA$deny == "yes", 1, 0)
# subset needed columns
subset <- c("deny", "hirat", "lvrat", "mhist", "unemp")
# subset data
data <- HMDA[complete.cases(HMDA), subset]
# do a 75-25 train-test split
train_row_numbers <- createDataPartition(data$deny, p=0.75, list=FALSE)
training <- data[train_row_numbers, ]
testing <- data[-train_row_numbers, ]
# fit a probit model and predict on testing data
probit.fit <- glm(deny ~ ., family = binomial(link = "probit"), data = training)
probit.pred <- predict(probit.fit, testing)
confmat_probit <- table(Predicted = probit.pred,
Actual = testing$deny)
confmat_probit
You need to specify the threshold or cut-point for predicting a dichotomous outcome. Predict returns the predicted values, not 0 / 1.
And be careful with the predict
function as the default type is "link", which in your case is the "probit". If you want predict
to return the probabilities, specify type="response"
.
probit.pred <- predict(probit.fit, testing, type="response")
Then choose a cut-point; any prediction above this value will be TRUE:
confmat_probit <- table(`Predicted>0.1` = probit.pred > 0.1 , Actual = testing$deny)
confmat_probit
Actual
Predicted>0.1 0 1
FALSE 248 21
TRUE 273 53