Search code examples
rscaleglmnetlasso-regressionstandardization

"standardize = " option in glmnet package


I have one question regarding the standardize option in a glmnet package.
I understand that scaling or standardizing dataset is necessary for the regression analysis in order to make the coefficients meaningful.
Usually, for just a linear regression (e.g., using a glm functionin R), I manually scale the dataset using a scale() function before I run the glm model.
However, it seems that, when it comes to using a glmnet package (for a regularized regression), a standardize option does standardize the dataset, thereby making the coefficients meaningful (comparable) by itself. Am I correct?

If this is correct, suppose that I run the following code. And it turns out that the variable "x3" has the highest coefficient (in an absolute value scale). Then can I conclude that the variable "x3" is the most important variable in discriminating the categories???

I am looking forward to hearing any opinions!! Thanks.

set.seed(12345) 
example.dat <- data.frame(Category = rbinom(100, 1, 0.5),
                          x1 = rpois(100, 10),
                          x2 = rnorm(100, 3, 10),
                          x3 = rbeta(100, 8, 20),
                          x4 = rnorm(100, -3, 45),
                          x5 = rnorm(100, 1000, 10000))

sample = sample.split(example.dat$Category, SplitRatio = .70)
train = subset(example.dat, sample == TRUE)
test  = subset(example.dat, sample == FALSE)

set.seed(12345)
lasso.fit <- cv.glmnet(data.matrix(train[,-1]),
                       train[,1], 
                       family         = "binomial",
                       nfolds         = nrow(train), # LOOCV
                       grouped        = FALSE,
                       type.measure   = "class",
                       alpha          = 0.6,
                       standardize    = TRUE,
                       standardize.response = TRUE)
print(lasso.fit)
coef       <- as.matrix(abs(coef(lasso.fit, s = "lambda.1se")))
coef.order <- as.matrix(coef[order(coef, decreasing = TRUE),])
rownames(as.matrix(coef.order[coef.order[,1]>0,]))
# [1] "x3"          "(Intercept)"

Solution

  • Bit of late response, but hope it helps.

    Keep in mind when using the native standarization option that glmnet returns the coefficients on the original scale (see below from the docs), so I would be careful about drawing that conclusion.

    Whenever I want to compare coefficients on the same scale, I use scale() to standardize before running glmnet. That way, you can get scaled coefficients returned for your comparisons.

    standardize
    Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize.

    glmnet documentation