Search code examples
rcross-validationlasso-regression

lambda.1se not being in one standard error of the error


In the documentation of the function cv.glmnet(), it is given that:

lambda.1se :
largest value of lambda such that error is within 1 standard error of the minimum.

Which means that lambda.1se gives the lambda, which gives an error (cvm) which is one standard error away from the minimum error.

So, while trying to check this fact:
There is a data set Boston in the library MASS. I performed a cross validation, using the lasso:

x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
y = Boston$crim
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)

And the value of cv.lasso$lambda.min comes out to be:

> cv.lasso$lambda.min
[1] 0.05630926

And, the value of cv.lasso$lambda.1se is:

> cv.lasso$lambda.1se
[1] 3.375651

Now, look at this:

> std(cv.lasso$cvm)
[1] 0.7177808

Where std is a function, that returns the standard error of the values inserted in it.1
And the minimum value of cvm can be found as:

> cv.lasso$cvm[cv.lasso$lambda==cv.lasso$lambda.min]
[1] 42.95009

So, we add the standard error to the value of cvm and we get:

> 42.95009+0.7177808
[1] 43.66787

Even though there is no lambda value corresponding to this cvm value, we can have an idea on the basis of existing data:
enter image description here

Which means lambda.1se should be between 0.4784899 and 0.4359821. But that's absolutely not the case. So, there's a gut feeling that says I'm making a blunder here. Can you help me point at that?


1:Definition of std:

std<-function(x)
  sd(x)/sqrt(length(x))

Solution

  • I'll add a seed so one can replicate the results below:

    library(glmnet)
    library(MASS)
    data("Boston")
    x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
    y = Boston$crim
    set.seed(100)
    cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)
    

    The minimum cross-validated MSE is min(cv.lasso$cvm) = 43.51256. The corresponding lambda is cv.lasso$lambda.min = 0.01843874. The lambda.1se is cv.lasso$lambda.1se = 3.375651. This corresponds to a cross-validated MSE of

    cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)] = 57.5393
    

    We can access the cross-validated standard errors directly from GLMNET's output as follows:

    cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)] = 15.40236
    

    So the cross-validated MSE one standard error away is

    43.51256 + 15.40236 = 58.91492 
    

    This is just slightly higher than the cross-validated MSE at lambda.1se above (i.e. 57.5393). If we look at the cross-validated MSE at the lambda just before the lambda.1se, it is:

    cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)-1] = 59.89079
    

    So now that we can reconcile GLMNET's output, let me explain why you are not getting the same thing using your calculation:

    1. cv.lasso$cvm contains the cross-validated mean MSE for each value of lambda.
    2. When we say 1 standard error, we are not talking about the standard error across the lambda's but the standard error across the folds for a given lambda.
    3. Continuing with the above point, at lambda.min, we have 10 folds. We fit 10 models and have 10 out-of-sample MSE's. The mean of these 10 MSE's is given by cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.min)]. The standard deviation of these 10 MSE's is given by cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)]. What we are not given in the GLMNET output are the 10 MSE's at lambda.min. If we had this, then we should be able to replicate the the standard error by using the formula you have above.

    Let me know if this helps.

    EDIT: Let's do an example whereby we define in advance three folds

    set.seed(100)
    folds = sample(1:3, nrow(x), replace = T)
    cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1, keep =T, foldid = folds)
    

    Note that

    > min(cv.lasso$cvm)
    [1] 42.76584
    > cv.lasso$cvsd[which.min(cv.lasso$cvm)]
    [1] 17.89725
    

    (These differ from the earlier example above because we've defined our own folds)

    Note also that I have an additional parameter keep = T in the cv.glmnet call. This returns the fold predictions for each lambda. You can extract them for the optimal lambda by doing:

    cv.lasso$fit.preval[,which.min(cv.lasso$cvm)]
    

    Before we proceed, let's create a data frame with the response, the fold predictions, and the corresponding folds:

    library(data.table)
    OOSPred = data.table(y = y, 
                         predictions = cv.lasso$fit.preval[,which.min(cv.lasso$cvm)], 
                         folds = folds)
    

    Here is a preview of the first 10 rows:

    > head(OOSPred, 10)
              y predictions folds
     1: 0.00632  -0.7477977     1
     2: 0.02731  -1.3823830     1
     3: 0.02729  -3.4826143     2
     4: 0.03237  -4.4419795     1
     5: 0.06905  -3.4373021     2
     6: 0.02985  -2.5256505     2
     7: 0.08829   0.7343478     3
     8: 0.14455   1.1262462     2
     9: 0.21124   4.0507847     2
    10: 0.17004   0.5859587     1
    

    For example, for the cases where folds = 1, a model was built on folds #2 & #3 and then predictions were obtained for the observations in fold #1. We compute the MSE by fold now:

    OOSPredSum = OOSPred[, list(MSE = mean((y - predictions)^2)), by = folds]
    
       folds      MSE
    1:     1 27.51469
    2:     2 75.72847
    3:     3 19.93480
    

    Finally, we return the mean MSE and standard error of the MSE across the folds

    > OOSPredSum[, list("Mean MSE" = mean(MSE), "Standard Error" = sd(MSE)/sqrt(3))]
       Mean MSE Standard Error
    1: 41.05932       17.47213
    

    GLMNET may be performing a weighted mean and standard error (weighted by the number of observations in each fold), which is why the figures above a close but do not match exactly.