Search code examples

Why need to tune lambda with caret::train(..., method = "glmnet") and cv.glmnet()?

As we can see that caret::train(..., method = "glmnet") with cross-validation or cv.glmnet() implemented both could find the lambda.min which minimize the cross-validation error. The final best fitted model should be the one fitted with lambda.min. Then, why do we need to set a grid of lambda values to the training process?


  • We use a custom tuning grid for a glmnet model, because the default tuning grid is very small and there are many more potential glmnet models we may want to explore.

    glmnet is capable of fitting 2 different kinds of penalized models, and it has 2 tuning parameters:

    1. alpha
      • Ridge regression (or alpha = 0)
      • Lasso regression (or alpha = 1)
    2. lambda
      • the strength of the penalty on the coefficients

    The glmnet model can fit many models at once (for single alpha, all values of lambda fit simultaneously), we can pass a large number of lambda values which control the amount of penalization in the model.

    train() is smart enough to only fit one model per alpha value and pass all of the lambda values at one for simultaneous fitting.


    # Make a custom tuning grid
    tuneGrid <- expand.grid(alpha = 0:1, lambda = seq(0.0001, 1, length = 10))
    # Fit a model
    model <- train(y ~ ., overfit, method = "glmnet",
      tuneGrid = tuneGrid, trControl = myControl
    # Sample Output
    Warning message: The metric "Accuracy" was not in the result set. ROC will be used instead.
    + Fold01: alpha=0, lambda=1 
    - Fold01: alpha=0, lambda=1 
    + Fold01: alpha=1, lambda=1 
    - Fold01: alpha=1, lambda=1 
    + Fold02: alpha=0, lambda=1 
    - Fold02: alpha=0, lambda=1 
    + Fold02: alpha=1, lambda=1 
    - Fold02: alpha=1, lambda=1 
    + Fold03: alpha=0, lambda=1 
    - Fold03: alpha=0, lambda=1 
    + Fold03: alpha=1, lambda=1 
    - Fold03: alpha=1, lambda=1 
    + Fold04: alpha=0, lambda=1 
    - Fold04: alpha=0, lambda=1 
    + Fold04: alpha=1, lambda=1 
    - Fold04: alpha=1, lambda=1 
    + Fold05: alpha=0, lambda=1 
    - Fold05: alpha=0, lambda=1 
    + Fold05: alpha=1, lambda=1 
    - Fold05: alpha=1, lambda=1 
    + Fold06: alpha=0, lambda=1 
    - Fold06: alpha=0, lambda=1 
    + Fold06: alpha=1, lambda=1 
    - Fold06: alpha=1, lambda=1 
    + Fold07: alpha=0, lambda=1 
    - Fold07: alpha=0, lambda=1 
    + Fold07: alpha=1, lambda=1 
    - Fold07: alpha=1, lambda=1 
    + Fold08: alpha=0, lambda=1 
    - Fold08: alpha=0, lambda=1 
    + Fold08: alpha=1, lambda=1 
    - Fold08: alpha=1, lambda=1 
    + Fold09: alpha=0, lambda=1 
    - Fold09: alpha=0, lambda=1 
    + Fold09: alpha=1, lambda=1 
    - Fold09: alpha=1, lambda=1 
    + Fold10: alpha=0, lambda=1 
    - Fold10: alpha=0, lambda=1 
    + Fold10: alpha=1, lambda=1 
    - Fold10: alpha=1, lambda=1 
    Aggregating results
    Selecting tuning parameters
    Fitting alpha = 1, lambda = 1 on full training set
    # Print model to console
    # Sample Output
    250 samples
    200 predictors
      2 classes: 'class1', 'class2' 
    No pre-processing
    Resampling: Cross-Validated (10 fold) 
    Summary of sample sizes: 225, 225, 225, 225, 224, 226, ... 
    Resampling results across tuning parameters:
      alpha  lambda  ROC        Sens  Spec     
      0      0.0001  0.3877717  0.00  0.9786232
      0      0.1112  0.4352355  0.00  1.0000000
      0      0.2223  0.4546196  0.00  1.0000000
      0      0.3334  0.4589674  0.00  1.0000000
      0      0.4445  0.4718297  0.00  1.0000000
      0      0.5556  0.4762681  0.00  1.0000000
      0      0.6667  0.4783514  0.00  1.0000000
      0      0.7778  0.4826087  0.00  1.0000000
      0      0.8889  0.4869565  0.00  1.0000000
      0      1.0000  0.4869565  0.00  1.0000000
      1      0.0001  0.3368659  0.05  0.9188406
      1      0.1112  0.5000000  0.00  1.0000000
      1      0.2223  0.5000000  0.00  1.0000000
      1      0.3334  0.5000000  0.00  1.0000000
      1      0.4445  0.5000000  0.00  1.0000000
      1      0.5556  0.5000000  0.00  1.0000000
      1      0.6667  0.5000000  0.00  1.0000000
      1      0.7778  0.5000000  0.00  1.0000000
      1      0.8889  0.5000000  0.00  1.0000000
      1      1.0000  0.5000000  0.00  1.0000000
    ROC was used to select the optimal model using  the largest value.
    The final values used for the model were alpha = 1 and lambda = 1.
    # Plot model

    enter image description here