I ran a 20-fold cv.glmnet
lasso model to obtain the "optimal" value for lambda. However, when I attempt to reproduce the results from glmnet()
, the I get an error that reads:
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
My code reads as such:
set.seed(5)
cv.out <- cv.glmnet(x[train,],y[train],family="binomial",nfolds=20,alpha=1,parallel=TRUE)
coef(cv.out)
bestlam <- cv.out$lambda.min
lasso.mod.best <- glmnet(x[train,],y[train],alpha=1,family="binomial",lambda=bestlam)
Now, the value of bestlam
above is 2.976023e-05
so perhaps this is causing the problem? Is it a rounding issue on the value of lambda? Is there a reason why I can't reproduce the results directly from the glmnet()
function? If I use a vector of lambda values in the similar range to this value of bestlam
, I do not have any issues.
You're passing a single lambda to your glmnet
(lambda=bestlab
) which is a big no-no (you're attempting to train a model just using one lambda value).
From the glmnet
documentation (?glmnet)
:
lambda: A user supplied lambda sequence. Typical usage is to have the
program compute its own lambda sequence based on nlambda and
lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use
with care. Do not supply a single value for lambda (for predictions after CV
use predict() instead). Supply instead a decreasing sequence of lambda
values. glmnet relies on its warms starts for speed, and its often faster to
fit a whole path than compute a single fit.