Search code examples
rfeature-selectionglmnettidymodels

extract feature importance in penalized logistc regression model


I use this example from the tidymodels website for my own data ( https://www.tidymodels.org/start/case-study/ ). In contrast to this example, my data demonstrate that penalized logistic regression outperforms random forest in terms of accuracy. However, in this example, It does not describe how to assess feature importance from the Penalized Logistics Regression (GLMNET) model . My question is whether this model selects some predictors to enter into the model? If yes, how do you determine which features are selected and how do you find out the importance of the features from the Penalized Logistics Regression (glmnet)? thank you very much for your answer


Solution

  • If you are really following the example in the link, you will notice they set mixture=1 , this is basically running a lasso from glmnet. You should most likely try to tune your penalty term, but in the end, the coefficients which are non zero are the ones selected. You can read this help page on glmnet, I think they cover it pretty well.

    Using a very minimum example, I set the penalty or lambda to be 0.01 and you can see which coefficients are non-zero:

    library(tidymodels)
    library(mlbench) 
    
    data(Sonar)
    
    lr_mod <- 
      logistic_reg(penalty = 0.01, mixture = 1) %>% 
      set_engine("glmnet") %>%
      fit(Class ~. , data = Sonar)
    
    reg_coef = coef(lr_mod$fit,s=0.01)
    

    these are selected (including intercept):

    reg_coef[reg_coef[,1]>0,]
     (Intercept)           V3           V7           V8          V14          V16 
     4.515343499  7.554970397  4.682233799  3.056506356  0.003707144  2.506047798 
             V31          V36          V37          V40          V50          V55 
     3.069514694  2.271717076  1.270513186  1.697256135 32.854319954 12.996429503 
             V57 
    36.520376537 
    

    These are kicked out :

    reg_coef[reg_coef[,1]==0,]
     V2  V5  V6 V10 V13 V15 V17 V18 V19 V25 V26 V27 V33 V34 V35 V38 V41 V42 V43 V46 
      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
    V47 V53 V56 V60 
      0   0   0   0