Search code examples
rglmnet

R coefficients of glmnet::cvfit


As far as I am concerned, cvfit does a K fold cross validation, which means that in each time, it separates all the data into training & validation set. For every fixed lambda, first it uses training data to get a coefficient vector. Then implements this constructed model to predict on the validation set to get the error.

Hence, for K fold CV, it has k coefficient vectors (each is generated from a training set). So what does

coef(cvfit)

get?

Here is an example:

x <- iris[1:100,1:4]
y <- iris[1:100,5]
y <- factor(y)

fit <- cv.glmnet(data.matrix(x), y, family = "binomial", type.measure =       "class",alpha=1,nfolds=3,standardize = T)
coef(fit, s=c(fit$lambda.min,fit$lambda.1se))

fit1 <- glmnet(data.matrix(x), y, family = "binomial",
           standardize = T,
           lambda = c(fit$lambda.1se,fit$lambda.min))
coef(fit1)

in fit1, I use the whole dataset as the training set, seems that the coefficients of fit1 and fit are just the same. That's why?

Thanks in advance.


Solution

  • Although cv.glmnet checks model performance by cross-validation, the actual model coefficients it returns for each lambda value are based on fitting the model with the full dataset.

    The help for cv.glmnet (type ?cv.glmnet) includes a Value section that describes the object returned by cv.glmet. The returned list object (fit in your case) includes an element called glmnet.fit. The help describes it like this:

    glmnet.fit a fitted glmnet object for the full data.