Search code examples
glmnetlasso-regression

How does glmnet() handle with both penalized and unpenalized covariates?


Is it possible to do a lasso model with both penalized and un-penalized covariates? That is, I want to do an estimate with Y ~ gamma * X + beta * Z, where X is a n*p penalized features and Z a n*q un-penalized covariates of continues or factor variables.

Thanks.


Solution

  • It is clearly stated in the vignette under the section called Penalty Factors. To ensure some variables are not penalized, set the penalty.factor to 0. You just need to create a vector of length ncol(X) + ncol(Z) where the first ncol(X) entries are 1 (or any positive non-zero number) and the other ncol(Z) entries are 0. For example:

    set.seed(1234)
    n = 100 # number of samples
    px = 5 # number of x variables 
    pz = 5 # number of z variables
    x <- matrix(rnorm(n*px), ncol = px)
    z <- matrix(rnorm(n*pz), ncol = pz)
    
    y <- x[,1] + x[,5] + 2*z[,1] + 3*rnorm(n) # generate response
    penalty <- c(rep(1, px), rep(0, pz)) # penalty factor
    
    plot(glmnet::glmnet(cbind(x,z), y, penalty.factor = penalty))
    

    Notice in the plot of the solution path, 5 of the variables are never 0 because they are never penalized.

    enter image description here