Search code examples
rplotglmnetcoefficients

Do the default variable trace plots of glmnet use standardized coefficients?


The default variable trace plot of glmnet are with standarized coefficients? How could i know? If not, how could i make one ?

set.seed(123)

lambdas <- 10^seq(3, -2, by = -.1)

cv.ridge <- cv.glmnet(x_train_r, y_train_r, alpha = 0, family = "binomial",lambda= lambdas)

plot(cv.ridge$glmnet.fit, "lambda", label=TRUE)

Trace plot with the coefficients. Are they standardized ?

Trace plot with the coefficients. Are they standardized ?


Solution

  • The coefficients are not standardized, see this post as well. You can easily check by doing a cross multiplication between the coefficients your non-standardized predictors:

    library(mlbench)
    data(Sonar)
    X=as.matrix(Sonar[,1:10])
    y=as.numeric(Sonar$Class)-1
    fit = cv.glmnet(X,y,alpha = 0, family = "binomial")
    

    Scales are too large to be standardized:

    plot(fit$glmnet.fit,"lambda")
    

    enter image description here

    We can double check:

    Co = coef(fit,s="lambda.1se")
    our_pred = cbind(1,X) %*% as.matrix(Co)
    y_pred = predict(fit,X,lambda="lambda.1se")
    
    table(our_pred == y_pred)
    
    TRUE 
     208
    

    So the coefficients are converted back to the original scale. To make one with standardized coefficients solely for visualization, we can just divide by the standard deviation of each predictor, but for the full derivation of the scaled coefficients, see the answer by @MatthewDury:

    #column standard deviation
    col_SD = apply(X,2,sd)
    
    Co = fit$glmnet.fit$beta
    Co = sweep(fit$glmnet.fit$beta,1,col_SD,"/")
    #cols = RColorBrewer::brewer.pal(nrow(Co),"Set3")
    l = fit$glmnet.fit$lambda
    names(l) = colnames(Co)
    
    library(ggplot2)
    library(reshape2)
    library(ggrepel)
    
    df = melt(as.matrix(Co))
    df$lambda = l[as.character(df$Var2)]
    
    ggplot(df,aes(x=lambda,y=value,col=Var1)) + 
    geom_line() + scale_x_log10() +
    geom_label_repel(data=subset(df,lambda==min(l)),
    aes(x=lambda,y=value,label=Var1),nudge_x=-0.1,show.legend=FALSE)
    

    enter image description here