Search code examples
rregressionglmnetlasso-regression

glmnet: at what lambda is each coefficient shrunk to 0?


I am using LASSO (from package glmnet) to select variables. I have fitted a glmnet model and plotted coefficients against lambda's.

library(glmnet)
set.seed(47)
x = matrix(rnorm(100 * 3), 100, 3)
y = rnorm(100)
fit = glmnet(x, y)
plot(fit, xvar = "lambda", label = TRUE)

plot

Now I want to get the order in which coefficients become 0. In other words, at what lambda does each coefficient become 0?

I don't find a function in glmnet to extract such result. How can I get it?


Solution

  • Function glmnetPath in my initial answer is now in an R package called solzy.

    ## you may need to first install package "remotes" from CRAN
    remotes::install_github("ZheyuanLi/solzy")
    
    ## Zheyuan Li's R functions on Stack Overflow
    library(solzy)
    
    ## use function `glmnetPath` for your example
    glmnetPath(fit)
    
    #$enter
    #  i  j ord var     lambda
    #1 3  2   1  V3 0.15604809
    #2 2 19   2  V2 0.03209148
    #3 1 24   3  V1 0.02015439
    #
    #$leave
    #  i  j ord var     lambda
    #1 1 23   1  V1 0.02211941
    #2 2 18   2  V2 0.03522036
    #3 3  1   3  V3 0.17126258
    #
    #$ignored
    #[1] i   var
    #<0 rows> (or 0-length row.names)
    

    Interpretation of enter

    As lambda decreases, variables (see i for numeric ID and var for variable names) enter the model in turn (see ord for the order). The corresponding lambda for the event is fit$lambda[j].

    • variable 3 enters the model at lambda = 0.15604809, the 2nd value in fit$lambda;

    • variable 2 enters the model at lambda = 0.03209148, the 19th value in fit$lambda;

    • variable 1 enters the model at lambda = 0.02015439, the 24th value in fit$lambda.

    Interpretation of leave

    As lambda increases, variables (see i for numeric ID and var for variable names) leave the model in turn (see ord for the order). The corresponding lambda for the event is fit$lambda[j].

    • variable 1 leaves the model at lambda = 0.02211941, the 23rd value in fit$lambda;

    • variable 2 leaves the model at lambda = 0.03522036, the 18th value in fit$lambda;

    • variable 3 leaves the model at lambda = 0.17126258, the 1st value in fit$lambda.

    Interpretation of ignored

    If not an empty data.frame, it lists variables that never enter the model. That is, they are effectively ignored. (Yes, this can happen!)

    Note: fit$lambda is decreasing, so j is in ascending order in enter but in descending order in leave.


    To further explain indices i and j, take variable 2 as an example. It leaves the model (i.e., its coefficient becomes 0) at j = 18 and enters the model (i.e., its coefficient becomes non-zero) at j = 19. You can verify this:

    fit$beta[2, 1:18]
    ## all zeros
    
    fit$beta[2, 19:ncol(fit$beta)]
    ## all non-zeros
    

    See Obtain variable selection order from glmnet for a more complicated example.