Search code examples
rprobabilityglmnetregularized

Why do I get probabilities outside 0 and 1 with my Logistic regularized glmnet code?


library(tidyverse)
library(caret)
library(glmnet)

creditdata <- read_excel("R bestanden/creditdata.xlsx")
df <- as.data.frame(creditdata)
df <- na.omit(df)
df$married <- as.factor(df$married)
df$graduate_school <- as.factor(df$graduate_school)
df$high_school <- as.factor(df$high_school)
df$default_payment_next_month <- as.factor(df$default_payment_next_month)
df$sex <- as.factor(df$sex)
df$single <- as.factor(df$single)
df$university <- as.factor(df$university)
set.seed(123)
training.samples <- df$default_payment_next_month %>% 




createDataPartition(p = 0.8, list = FALSE)
train.data  <- df[training.samples, ]
test.data <- df[-training.samples, ]
x <- model.matrix(default_payment_next_month~., train.data)[,-1]
y <- ifelse(train.data$default_payment_next_month == 1, 1, 0)

cv.lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial")
lasso.model <- glmnet(x, y, alpha = 1, family = "binomial",
                      lambda = cv.lasso$lambda.1se)
x.test <- model.matrix(default_payment_next_month ~., test.data)[,-1]
probabilities <- lasso.model %>% predict(newx = x.test)
predicted.classes <- ifelse(probabilities > 0.5, "1", "0")
observed.classes <- test.data$default_payment_next_month
mean(predicted.classes == observed.classes)

Hi guys,

I'm new in R and I've been trying to use the exact code as on this website http://www.sthda.com/english/articles/36-classification-methods-essentials/149-penalized-logistic-regression-essentials-in-r-ridge-lasso-and-elastic-net/ to perform a logistic ridge regression. My aim is to predict if a client has credit card default or not, and we have a data set with factor variables as well as numerical variables. The problem is that most of my probabilities are negative and smaller than -1, so -2.6, -1.4 etc. Does anyone know what is going wrong here?

Thanks in advance for the help!


Solution

  • Just like for glm, by default the predict function for glmnet returns predictions on the scale of the link function, which aren't probabilities.

    To get the predicted probabilities, add type = "response" to the predict call:

    probabilities <- lasso.model %>% predict(newx = x.test, type = "response")