Search code examples
rglmnet

Glmnet is different with intercept=TRUE compared to intercept=FALSE and with penalty.factor=0 for an intercept in x


I am new to glmnet and playing with the penalty.factor option. The vignette says that it "Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model." And the longer PDF document has code. So I expected that running a regression with intercept = TRUE and no constant in x would be the same as with intercept = FALSE and a constant in x with penalty.factor = 0. But the code below shows that it is not: the latter case has an intercept of 0 and the other two coefficients are 20% larger than in the former.

library("glmnet")

set.seed(7)

# penalty for the intercept
intercept_penalty <- 0

# Simulate data with 2 features
num_regressors <- 2
num_observations <- 100
X <- matrix(rnorm(num_regressors * num_observations),
            ncol = num_regressors,
            nrow = num_observations)

# Add an intercept in the right-hand side matrix: X1 = (intercept + X)
X1 <- cbind(matrix(1, ncol = 1, nrow = num_observations), X)

# Set random parameters for the features
beta <- runif(1 + num_regressors)

# Generate observations for the left-hand side
Y <- X1 %*% beta + rnorm(num_observations) / 10

# run OLS
ols <- lm(Y ~ X)
coef_ols <- coef(ols)

# Run glmnet with an intercept in the command, not in the matrix
fit <- glmnet(y = Y,
              x = X,
              intercept = T,
              penalty.factor = rep(1, num_regressors),
              lambda = 0)
coef_intercept_equal_true <- coef(fit)

# run glmnet with an intercept in the matrix with a penalty
# factor of intercept_penalty for the intercept and 1 for the rest
fit_intercept_equal_false <- glmnet(y = Y,
              x = X1,
              intercept = F,
              penalty.factor = c(intercept_penalty, rep(1, num_regressors)),
              lambda = 0)
coef_intercept_equal_false <- coef(fit_intercept_equal_false)

# Compare all three methods in a data frame
# For lasso_intercept_equal_false, the index starts at 2 because
# position 1 is reserved for intercepts, which is missing in this case
comparison <- data.frame(original = beta,
                         ols = coef_ols,
                         lasso_intercept_equal_true = coef_intercept_equal_true[1:length(coef_intercept_equal_true)],
                         lasso_intercept_equal_false = coef_intercept_equal_false[2:length(coef_intercept_equal_false)]
)

comparison$difference <- comparison$lasso_intercept_equal_false - comparison$lasso_intercept_equal_true
comparison

Furthermore, the discrepancy for this example is the same with different penalty factors for the intercept term, whether intercept_penalty equals 0, 1, 3000, -10, etc. The discrepancy is similar with a positive penalty, e.g. lambda = 0.01.

If this is not a bug, what is the proper usage of penalty.factor?


Solution

  • I contacted the author, who confirmed that this is a bug and added that it is on his list of bug fixes. In the meantime, a workaround is to center the regressors, e.g. with

    fit_centered <- glmnet(y = Y,
                           x = scale(X1, T, F),
                           intercept = F,
                           lambda = 0)
    

    And in this case, the penalty factor does not matter. Here is a revised script that compares OLS, LASSO with intercept, LASSO without intercept, and LASSO with centered regressors:

    library("glmnet")
    
    set.seed(7)
    
    # Simulate data with 2 features
    num_regressors <- 2
    num_observations <- 100
    X <- matrix(rnorm(num_regressors * num_observations),
                ncol = num_regressors,
                nrow = num_observations)
    
    # Add an intercept in the right-hand side matrix: X1 = (intercept + X)
    X1 <- cbind(matrix(1, ncol = 1, nrow = num_observations), X)
    
    # Set random parameters for the features
    beta <- runif(1 + num_regressors)
    
    # Generate observations for the left-hand side
    Y <- X1 %*% beta + rnorm(num_observations) / 10
    
    # run OLS
    ols <- lm(Y ~ X)
    coef_ols <- coef(ols)
    
    # Run glmnet with an intercept in the command, not in the matrix
    fit <- glmnet(y = Y,
                  x = X,
                  intercept = T,
                  penalty.factor = rep(1, num_regressors),
                  lambda = 0)
    coef_intercept <- coef(fit)
    
    # run glmnet with an intercept in the matrix with a penalty
    # factor of 0 for the intercept and 1 for the rest
    fit_no_intercept <- glmnet(y = Y,
                  x = X1,
                  intercept = F,
                  lambda = 0)
    coef_no_intercept <- coef(fit_no_intercept)
    
    # run glmnet with an intercept in the matrix with a penalty
    # factor of 0 for the intercept and 1 for the rest
    # If x is centered, it works (even though y is not centered). Center it with:
    #   X1 - matrix(colMeans(X1), nrow = num_observations, ncol = 1 + num_regressors, byrow = T)
    # or with
    # X1_centered = scale(X1, T, F)
    
    fit_centered <- glmnet(y = Y,
                          x = scale(X1, T, F),
                          intercept = F,
                          lambda = 0)
    coef_centered <- coef(fit_centered)
    
    # Compare all three methods in a data frame
    # For lasso_intercept and the others, the index starts at 2 because
    # position 1 is reserved for intercepts, which is missing in this case
    comparison <- data.frame(ols = coef_ols,
                             lasso_intercept = coef_intercept[1:length(coef_intercept)],
                             lasso_no_intercept = coef_no_intercept[2:length(coef_no_intercept)],
                             lasso_centered = coef_centered[2:length(coef_centered)]
    )
    
    
    comparison$diff_intercept <- comparison$lasso_intercept - comparison$lasso_no_intercept
    comparison$diff_centered <- comparison$lasso_centered - comparison$lasso_intercept
    comparison
    

    The answer:

                      ols lasso_intercept lasso_no_intercept lasso_centered diff_intercept diff_centered
    (Intercept) 0.9748302       0.9748302          0.0000000      0.0000000      0.9748302 -9.748302e-01
    X1          0.6559541       0.6559541          0.7974851      0.6559541     -0.1415309  2.220446e-16
    X2          0.7986957       0.7986957          0.9344306      0.7986957     -0.1357348  4.440892e-16
    

    For LASSO with centered regressors, the estimated intercept is 0 but the other coefficients are the same as the LASSO with intercept.