In the documentation of the function cv.glmnet()
, it is given that:
lambda.1se :
largest value of lambda such that error is within 1 standard error of the minimum.
Which means that lambda.1se
gives the lambda
, which gives an error (cvm
) which is one standard error away from the minimum error.
So, while trying to check this fact:
There is a data set Boston
in the library MASS
. I performed a cross validation, using the lasso:
x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
y = Boston$crim
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)
And the value of cv.lasso$lambda.min
comes out to be:
> cv.lasso$lambda.min
[1] 0.05630926
And, the value of cv.lasso$lambda.1se
is:
> cv.lasso$lambda.1se
[1] 3.375651
Now, look at this:
> std(cv.lasso$cvm)
[1] 0.7177808
Where std
is a function, that returns the standard error of the values inserted in it.1
And the minimum value of cvm
can be found as:
> cv.lasso$cvm[cv.lasso$lambda==cv.lasso$lambda.min]
[1] 42.95009
So, we add the standard error to the value of cvm
and we get:
> 42.95009+0.7177808
[1] 43.66787
Even though there is no lambda
value corresponding to this cvm
value, we can have an idea on the basis of existing data:
Which means lambda.1se
should be between 0.4784899 and 0.4359821. But that's absolutely not the case. So, there's a gut feeling that says I'm making a blunder here. Can you help me point at that?
1:Definition of std
:
std<-function(x)
sd(x)/sqrt(length(x))
I'll add a seed so one can replicate the results below:
library(glmnet)
library(MASS)
data("Boston")
x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
y = Boston$crim
set.seed(100)
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)
The minimum cross-validated MSE is min(cv.lasso$cvm) = 43.51256
. The corresponding lambda is cv.lasso$lambda.min = 0.01843874
. The lambda.1se
is cv.lasso$lambda.1se = 3.375651
. This corresponds to a cross-validated MSE of
cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)] = 57.5393
We can access the cross-validated standard errors directly from GLMNET's output as follows:
cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)] = 15.40236
So the cross-validated MSE one standard error away is
43.51256 + 15.40236 = 58.91492
This is just slightly higher than the cross-validated MSE at lambda.1se
above (i.e. 57.5393
). If we look at the cross-validated MSE at the lambda
just before the lambda.1se
, it is:
cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)-1] = 59.89079
So now that we can reconcile GLMNET's output, let me explain why you are not getting the same thing using your calculation:
cv.lasso$cvm
contains the cross-validated mean MSE for each value of lambda
. lambda.min
, we have 10 folds. We fit 10 models and have 10 out-of-sample MSE's. The mean of these 10 MSE's is given by cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.min)]
. The standard deviation of these 10 MSE's is given by cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)]
. What we are not given in the GLMNET output are the 10 MSE's at lambda.min
. If we had this, then we should be able to replicate the the standard error by using the formula you have above.Let me know if this helps.
EDIT: Let's do an example whereby we define in advance three folds
set.seed(100)
folds = sample(1:3, nrow(x), replace = T)
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1, keep =T, foldid = folds)
Note that
> min(cv.lasso$cvm)
[1] 42.76584
> cv.lasso$cvsd[which.min(cv.lasso$cvm)]
[1] 17.89725
(These differ from the earlier example above because we've defined our own folds)
Note also that I have an additional parameter keep = T
in the cv.glmnet
call. This returns the fold predictions for each lambda. You can extract them for the optimal lambda by doing:
cv.lasso$fit.preval[,which.min(cv.lasso$cvm)]
Before we proceed, let's create a data frame with the response, the fold predictions, and the corresponding folds:
library(data.table)
OOSPred = data.table(y = y,
predictions = cv.lasso$fit.preval[,which.min(cv.lasso$cvm)],
folds = folds)
Here is a preview of the first 10 rows:
> head(OOSPred, 10)
y predictions folds
1: 0.00632 -0.7477977 1
2: 0.02731 -1.3823830 1
3: 0.02729 -3.4826143 2
4: 0.03237 -4.4419795 1
5: 0.06905 -3.4373021 2
6: 0.02985 -2.5256505 2
7: 0.08829 0.7343478 3
8: 0.14455 1.1262462 2
9: 0.21124 4.0507847 2
10: 0.17004 0.5859587 1
For example, for the cases where folds = 1
, a model was built on folds #2 & #3 and then predictions were obtained for the observations in fold #1. We compute the MSE by fold now:
OOSPredSum = OOSPred[, list(MSE = mean((y - predictions)^2)), by = folds]
folds MSE
1: 1 27.51469
2: 2 75.72847
3: 3 19.93480
Finally, we return the mean MSE and standard error of the MSE across the folds
> OOSPredSum[, list("Mean MSE" = mean(MSE), "Standard Error" = sd(MSE)/sqrt(3))]
Mean MSE Standard Error
1: 41.05932 17.47213
GLMNET may be performing a weighted mean and standard error (weighted by the number of observations in each fold), which is why the figures above a close but do not match exactly.