I've noticed when running penalized logistic regression in caret with the glmnet package, the model predictions are reclassified as 0 or 1 outcomes:
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
train_control <- trainControl(method="cv", number=10, savePredictions = TRUE)
glmnetGrid <- expand.grid(alpha=c(0, .5, 1), lambda=c(.1, 1, 10))
model<- train(as.factor(admit) ~ ., data=mydata, trControl=train_control, method="glmnet", family="binomial", tuneGrid=glmnetGrid, metric="Accuracy", preProcess=c("center","scale"))
model
glmnet
400 samples
3 predictor
2 classes: '0', '1'
Pre-processing: centered (3), scaled (3)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 360, 360, 361, 359, 360, 361, ...
Resampling results across tuning parameters:
alpha lambda Accuracy Kappa Accuracy SD Kappa SD
0.0 0.1 0.6923233271 0.09027099758 0.018975211636 0.06988057154
0.0 1.0 0.6825703565 0.00000000000 0.007557700521 0.00000000000
0.0 10.0 0.6825703565 0.00000000000 0.007557700521 0.00000000000
0.5 0.1 0.6825703565 0.00000000000 0.007557700521 0.00000000000
0.5 1.0 0.6825703565 0.00000000000 0.007557700521 0.00000000000
0.5 10.0 0.6825703565 0.00000000000 0.007557700521 0.00000000000
1.0 0.1 0.6825703565 0.00000000000 0.007557700521 0.00000000000
1.0 1.0 0.6825703565 0.00000000000 0.007557700521 0.00000000000
1.0 10.0 0.6825703565 0.00000000000 0.007557700521 0.00000000000
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were alpha = 0 and lambda = 0.1.
> head(model$pred)
pred obs rowIndex alpha lambda Resample
1 0 0 16 0 10 Fold01
2 0 0 17 0 10 Fold01
3 0 0 24 0 10 Fold01
4 0 1 46 0 10 Fold01
5 0 0 69 0 10 Fold01
6 0 0 84 0 10 Fold01
> summary(model$pred)
pred obs rowIndex alpha lambda Resample
0:3576 0:2457 Min. : 1.00 Min. :0.0 Min. : 0.1 Length:3600
1: 24 1:1143 1st Qu.:100.75 1st Qu.:0.0 1st Qu.: 0.1 Class :character
Median :200.50 Median :0.5 Median : 1.0 Mode :character
Mean :200.50 Mean :0.5 Mean : 3.7
3rd Qu.:300.25 3rd Qu.:1.0 3rd Qu.:10.0
Max. :400.00 Max. :1.0 Max. :10.0
Is it possible to obtain the raw predicted probabilities = exp(logit(y)) rather than 0/1 predicted outcomes?
You have to use the option ClassProbs
in trainControl
. The factor admit needs to be a character because this will be used as a column name. See following example.
library(caret)
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$admit <- as.factor(mydata$admit)
#create levels yes/no to make sure the the classprobs get a correct name
levels(mydata$admit) = c("yes", "no")
train_control <- trainControl(method="cv", number=10, classProbs = TRUE, savePredictions = TRUE)
glmnetGrid <- expand.grid(alpha=c(0, .5, 1), lambda=c(.1, 1, 10))
set.seed(4242)
model<- train(admit ~ .,
data=mydata,
trControl = train_control,
method="glmnet",
family="binomial",
tuneGrid=glmnetGrid,
metric="Accuracy",
preProcess=c("center","scale"))
head(model$pred)
pred obs rowIndex yes no alpha lambda Resample
1 yes no 4 0.6856383 0.3143617 0 10 Fold01
2 yes no 6 0.6796251 0.3203749 0 10 Fold01
3 yes yes 10 0.6764742 0.3235258 0 10 Fold01
4 yes yes 71 0.6795685 0.3204315 0 10 Fold01
5 yes no 78 0.6774003 0.3225997 0 10 Fold01
6 yes yes 82 0.6812158 0.3187842 0 10 Fold01