The default variable trace plot of glmnet are with standarized coefficients? How could i know? If not, how could i make one ?
set.seed(123)
lambdas <- 10^seq(3, -2, by = -.1)
cv.ridge <- cv.glmnet(x_train_r, y_train_r, alpha = 0, family = "binomial",lambda= lambdas)
plot(cv.ridge$glmnet.fit, "lambda", label=TRUE)
Trace plot with the coefficients. Are they standardized ?
The coefficients are not standardized, see this post as well. You can easily check by doing a cross multiplication between the coefficients your non-standardized predictors:
library(mlbench)
data(Sonar)
X=as.matrix(Sonar[,1:10])
y=as.numeric(Sonar$Class)-1
fit = cv.glmnet(X,y,alpha = 0, family = "binomial")
Scales are too large to be standardized:
plot(fit$glmnet.fit,"lambda")
We can double check:
Co = coef(fit,s="lambda.1se")
our_pred = cbind(1,X) %*% as.matrix(Co)
y_pred = predict(fit,X,lambda="lambda.1se")
table(our_pred == y_pred)
TRUE
208
So the coefficients are converted back to the original scale. To make one with standardized coefficients solely for visualization, we can just divide by the standard deviation of each predictor, but for the full derivation of the scaled coefficients, see the answer by @MatthewDury:
#column standard deviation
col_SD = apply(X,2,sd)
Co = fit$glmnet.fit$beta
Co = sweep(fit$glmnet.fit$beta,1,col_SD,"/")
#cols = RColorBrewer::brewer.pal(nrow(Co),"Set3")
l = fit$glmnet.fit$lambda
names(l) = colnames(Co)
library(ggplot2)
library(reshape2)
library(ggrepel)
df = melt(as.matrix(Co))
df$lambda = l[as.character(df$Var2)]
ggplot(df,aes(x=lambda,y=value,col=Var1)) +
geom_line() + scale_x_log10() +
geom_label_repel(data=subset(df,lambda==min(l)),
aes(x=lambda,y=value,label=Var1),nudge_x=-0.1,show.legend=FALSE)