r machine-learning statistics regression glmnet

In R, how can I explain the graph in glmnet package?

I am trying to understand "glmnet"package. But I still have some questions. 1. what is the meaning of upper number (31, 31, 31.... 3, 2, 2, 2) 2. what is the he vertical dotted lines? why two lines are selected? 3. Why this shows curvilinear pattern??

library(glmnet)
data(MultinomialExample)
cvfit=cv.glmnet(x, y, family="multinomial", type.multinomial = "grouped")
plot(cvfit)

And, the below is plotting of cvfit(result)

Thank you

Solution

With crossvalidation, you are trying to find in this case, the best value for lambda for elastic net. Briefly, elastic net is a mixture of lasso and ridge regression, where ridge regression tries to force all your coefficients towards zero. lambda(λ) basically tells you how much to force it towards zero.

the numbers are the number of non-zero coefficients

On the x-axis you have different lambda values glmnet tried to fit with crossvalidation. On the extreme left you have values that are close to zero, and you would expect all of your coefficents to be non-zero, which is what the numbers on top represent. You can also see this under:

cvfit$nzero
 s0  s1  s2  s3  s4  s5  s6  s7  s8  s9 s10 s11 s12 s13 s14 s15 s16 s17 s18 s19 
  0   1   1   1   1   1   1   2   3   3   7   7   8   8   9   9   9  10  10  10 
s20 s21 s22 s23 s24 s25 s26 s27 s28 s29 s30 s31 s32 s33 s34 s35 s36 s37 s38 s39 
 12  13  14  14  18  18  20  20  21  23  23  25  26  26  26  26  27  27  28  28 
s40 s41 s42 s43 s44 s45 s46 s47 s48 s49 s50 s51 s52 s53 s54 s55 s56 s57 s58 s59 
 29  29  30  30  30  30  30  30  30  30  30  30  30  30  30  30  30  30  30  30 
s60 s61 s62 s63 s64 s65 s66 s67 s68 s69 s70 s71 
 30  30  30  30  30  30  30  30  30  30  30  30

which is from the vignette:

nzero: number of non-zero coefficients at each ‘lambda’.

the lines are the values of lambda with minimum deviance, and 1 se away from the minimum lambda

The y-axis is deviance, which tells you how much error happens across all the tested values. The lower it is, the better the predictive ability of your model. You would expect an optimal lambda that gives you the least error in prediction. This is the first line from the left.

cvfit$lambda.min
[1] 0.01291017

The next line is the lambda that uses less coefficients (hence more parsimonious) and is still not too far away from the best predictive model. And this is the second line:

cvfit$lambda.1se
[1] 0.02717467

You can read more it in Friedman et al on this post