Confusion about 'standardize' option of glmnet package in R

I have a confusion about the standardize option of glmnet package in R. I get different coefficients when I standardize the covariates matrix and set standardize=FALSE vs. when I do not standardize the covariates matrix and set standardize=TRUE. I assumed they would be the same! These two are shown with an example by creating the following ridge.mod1 and ridge.mod2 models. I also created a model (ridge.mod3) that standardized the outcome (and the covariates matrix) and used the option standardize=FALSE. I was just checking if I needed to standardize the outcome too to get the same coefficients as in ridge.mod1.

set.seed(1)
y <- rnorm(30, 20, 10) 
x1 <- rnorm(30, 5, 2)
x2 <- x1 + rnorm(30, 0, 5)
cor(x1,x2)
x <- as.matrix(cbind(x1,x2))
z1 <- scale(x1)
z2 <- scale(x2)
z <- as.matrix(cbind(z1,z2))
y.scale <- scale(y)
n <- 30
# Fixing foldid for proper comparison
foldid=sample(rep(seq(5),length=n))
table(foldid)

library(glmnet)
cv.ridge.mod1 <- cv.glmnet(x, y, alpha = 0, nfolds = 5, foldid=foldid, standardize = TRUE)
ridge.mod1 <- glmnet(x, y, alpha = 0, standardize = TRUE)
coef(ridge.mod1, s=cv.ridge.mod1$lambda.min)

> coef(ridge.mod1, s=cv.ridge.mod1$lambda.min)
3 x 1 sparse Matrix of class "dgCMatrix"
                       1
(Intercept) 2.082458e+01
x1          2.856136e-37
x2          4.334910e-38

cv.ridge.mod2 <- cv.glmnet(z, y, alpha = 0, nfolds = 5, foldid=foldid, standardize = FALSE)
ridge.mod2 <- glmnet(z, y, alpha = 0, standardize = FALSE)
coef(ridge.mod2, s=cv.ridge.mod2$lambda.min)

> coef(ridge.mod2, s=cv.ridge.mod2$lambda.min)
3 x 1 sparse Matrix of class "dgCMatrix"
                       1
(Intercept) 2.082458e+01
V1          4.391657e-37
V2          2.389751e-37

cv.ridge.mod3 <- cv.glmnet(z, y.scale, alpha = 0, nfolds = 5, foldid=foldid, standardize = FALSE)
ridge.mod3 <- glmnet(z, y.scale, alpha = 0, standardize = FALSE)
coef(ridge.mod3, s=cv.ridge.mod3$lambda.min)

> coef(ridge.mod3, s=cv.ridge.mod3$lambda.min)
3 x 1 sparse Matrix of class "dgCMatrix"
                       1
(Intercept) 1.023487e-16
V1          4.752255e-38
V2          2.585973e-38

Could anyone please tell me what's going on there and if (or how) I can get the same coefficients as in ridge.mod1 with prior standardization (in the data processing step) and then using standardize=FALSE?

Update: (what I tried based on the comments below)

So, I tried standardizing by SS/n instead of SS/(n-1). I tried by standardizing both y and x. Neither gave me coefficients equal to the de-standardized coefficients of model 1.

## Standadizing by sqrt(SS(X)/n) like glmnet instead of sqrt(SS(X)/(n-1)) which is done by the scale command
Xs <- apply(x, 2, function(m) (m - mean(m)) / sqrt(sum(m^2) / n))
Ys <- (y-mean(y)) / sqrt(sum(y^2) / n)

# Standadizing only X by sqrt(SS(X)/n)
cv.ridge.mod4 <- cv.glmnet(Xs, y, alpha = 0, nfolds = 5, foldid=foldid, standardize = FALSE)
ridge.mod4 <- glmnet(Xs, y, alpha = 0, standardize = FALSE)
coef(ridge.mod4, s=cv.ridge.mod4$lambda.min)

> coef(ridge.mod4, s=cv.ridge.mod4$lambda.min)[2]/sd(x1)
[1] 7.995171e-38
> coef(ridge.mod4, s=cv.ridge.mod4$lambda.min)[3]/sd(x2)
[1] 2.957854e-38

# Standadizing both Y and X by sqrt(SS(X)/n) but neither is centered
cv.ridge.mod6 <- cv.glmnet(Xs.noncentered, Ys.noncentered, alpha = 0, nfolds = 5, foldid=foldid, standardize = FALSE)
ridge.mod6 <- glmnet(Xs.noncentered, Ys.noncentered, alpha = 0, standardize = FALSE)
coef(ridge.mod6, s=cv.ridge.mod6$lambda.min)

> coef(ridge.mod6, s=cv.ridge.mod6$lambda.min)[2] / (sqrt(sum(x1^2) / n))
[1] 1.019023e-39
> coef(ridge.mod6, s=cv.ridge.mod6$lambda.min)[3] / (sqrt(sum(x2^2) / n))
[1] 9.189263e-40

What is it that still is wrong there?

Solution

I tweaked your code so that I can work with a more sensible problem. In order to reproduce the coefficients changing the standardize=TRUE and standardize=FALSE options you need to first standardize the variables with the (1/N) variance estimator formula. For this example I also centered the variables to get rid of the constant. I focus only on the coefficients of the variables. After that you have to notice that $\beta_{{std}}^{(j)}=\beta^{(j)}\frac{{s_x}^{(j)}}{s_y}$ hence you have to invert that formula to get the de-standardized coefficients. I do that in the following code.

set.seed(1)

x1 <- rnorm(300, 5, 2)
x2 <- x1 + rnorm(300, 0, 5)
x3 <- rnorm(300, 6, 5)
e= rnorm(300, 0, 1)
y <- 0.3*x1+3.5*x2+x3+e

x <- as.matrix(cbind(x1,x2,x3))


sdN=function(x){
sigma=sqrt( (1/length(x)) * sum((x-mean(x))^2))
return(sigma)
}

n=300
foldid=sample(rep(seq(5),length=n))

g1=(x1-mean(x1))/sdN(x1)
g2=(x2-mean(x2))/sdN(x2)
g3=(x3-mean(x3))/sdN(x3)
gy=(y-mean(y))/sdN(y)
equis <- as.matrix(cbind(g1,g2,g3))



library(glmnet)
cv.ridge.mod1 <- cv.glmnet(x, y, alpha = 0, nfolds = 5, foldid=foldid,standardize = TRUE)
coef(cv.ridge.mod1, s=cv.ridge.mod1$lambda.min)


cv.ridge.mod2 <- cv.glmnet(equis, gy, alpha = 0, nfolds = 5, foldid=foldid, standardize = FALSE, intercept=FALSE)
beta=coef(cv.ridge.mod2, s=cv.ridge.mod2$lambda.min)


beta[2]*sdN(y)/sdN(x1)
beta[3]*sdN(y)/sdN(x2)
beta[4]*sdN(y)/sdN(x3)

coef(cv.ridge.mod1, s=cv.ridge.mod1$lambda.min)

this yields the results:

> beta[2]*sdN(y)/sdN(x1)
[1] 0.5984356
> beta[3]*sdN(y)/sdN(x2)
[1] 3.166033
> beta[4]*sdN(y)/sdN(x3)
[1] 0.9145646
> 
> coef(cv.ridge.mod1, s=cv.ridge.mod1$lambda.min)
4 x 1 sparse Matrix of class "dgCMatrix"
                    1
(Intercept) 0.5951423
x1          0.5984356
x2          3.1660328
x3          0.9145646

As you can see the coefficients are the same at 4 decimals. So I hope this answer your question.