I use this example from the tidymodels website for my own data ( https://www.tidymodels.org/start/case-study/ ). In contrast to this example, my data demonstrate that penalized logistic regression outperforms random forest in terms of accuracy. However, in this example, It does not describe how to assess feature importance from the Penalized Logistics Regression (GLMNET) model . My question is whether this model selects some predictors to enter into the model? If yes, how do you determine which features are selected and how do you find out the importance of the features from the Penalized Logistics Regression (glmnet)? thank you very much for your answer
If you are really following the example in the link, you will notice they set mixture=1
, this is basically running a lasso from glmnet
. You should most likely try to tune your penalty
term, but in the end, the coefficients which are non zero are the ones selected. You can read this help page on glmnet, I think they cover it pretty well.
Using a very minimum example, I set the penalty
or lambda
to be 0.01 and you can see which coefficients are non-zero:
library(tidymodels)
library(mlbench)
data(Sonar)
lr_mod <-
logistic_reg(penalty = 0.01, mixture = 1) %>%
set_engine("glmnet") %>%
fit(Class ~. , data = Sonar)
reg_coef = coef(lr_mod$fit,s=0.01)
these are selected (including intercept):
reg_coef[reg_coef[,1]>0,]
(Intercept) V3 V7 V8 V14 V16
4.515343499 7.554970397 4.682233799 3.056506356 0.003707144 2.506047798
V31 V36 V37 V40 V50 V55
3.069514694 2.271717076 1.270513186 1.697256135 32.854319954 12.996429503
V57
36.520376537
These are kicked out :
reg_coef[reg_coef[,1]==0,]
V2 V5 V6 V10 V13 V15 V17 V18 V19 V25 V26 V27 V33 V34 V35 V38 V41 V42 V43 V46
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V47 V53 V56 V60
0 0 0 0