I've read a few Q&As about this, but am still not sure I understand, why the coefficients from glmnet and caret models based on the same sample and the same hyper-parameters are slightly different. Would greatly appreciate an explanation!
I am using caret to train a ridge regression:
library(ISLR)
Hitters = na.omit(Hitters)
x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column.
y = Hitters$Salary
set.seed(0)
train = sample(1:nrow(x), 7*nrow(x)/10)
library(caret)
set.seed(0)
train_control = trainControl(method = 'cv', number = 10)
grid = 10 ^ seq(5, -2, length = 100)
tune.grid = expand.grid(lambda = grid, alpha = 0)
ridge.caret = train(x[train, ], y[train],
method = 'glmnet',
trControl = train_control,
tuneGrid = tune.grid)
ridge.caret$bestTune
# alpha is 0 and best lambda is 242.0128
Now, I use the lambda (and alpha) found above to train a ridge regression for the whole data set. At the end, I extract the coefficients:
ridge_full <- train(x, y,
method = 'glmnet',
trControl = trainControl(method = 'none'),
tuneGrid = expand.grid(
lambda = ridge.caret$bestTune$lambda, alpha = 0)
)
coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)
Finally, using exactly the same alpha and lambda, I try to fit the same ridge regression using glmnet package - and extract coefficients:
library(glmnet)
ridge_full2 = glmnet(x, y, alpha = 0, lambda = ridge.caret$bestTune$lambda)
coef(ridge_full2)
The reason is the fact the exact lambda you specified was not used by caret. You can check this by:
ridge_full$finalModel$lambda
closest values are 261.28915 and 238.07694.
When you do
coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)
where s is 242.0128 the coefficients are interpolated from the coefficients actually calculated.
Wheres when you provide lambda to the glmnet call the model returns exact coefficients for that lambda which differ only slightly from the interpolated ones caret returns.
Why this happens:
when you specify one alpha and one lambda for a fit on all of the data caret will actually fit:
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
numLev <- if(is.character(y) | is.factor(y)) length(levels(y)) else NA
theDots <- list(...)
if(all(names(theDots) != "family")) {
if(!is.na(numLev)) {
fam <- ifelse(numLev > 2, "multinomial", "binomial")
} else fam <- "gaussian"
theDots$family <- fam
}
## pass in any model weights
if(!is.null(wts)) theDots$weights <- wts
if(!(class(x)[1] %in% c("matrix", "sparseMatrix")))
x <- Matrix::as.matrix(x)
modelArgs <- c(list(x = x,
y = y,
alpha = param$alpha),
theDots)
out <- do.call(glmnet::glmnet, modelArgs)
if(!is.na(param$lambda[1])) out$lambdaOpt <- param$lambda[1]
out
}
this was taken from here.
in your example this translates to
fit <- glmnet::glmnet(x, y,
alpha = 0)
lambda <- unique(fit$lambda)
these lambda values correspond to ridge_full$finalModel$lambda
:
all.equal(lambda, ridge_full$finalModel$lambda)
#output
TRUE