Search code examples

How to do feature selection with tidymodels

I have a logistic regression model I've created in tidymodels (R). I'm trying to do feature selection. How can I do feature selection in the tidymodels framework using packages published on CRAN (no development packages, please)?

Everyone just says to do regularized logistic regression, but I need to be able to do inference/have parameter confidence intervals, which regularization can't do.


  • We (the tidymodels group) are working on more supervised filtering methods later in 2023. In the meantime, the recipeselectors package is a great tool to use.

    One thing though... the standard errors and p-values are most likely not valid if you have searched through a large number of models. The results would be, to some unknown extent, overly optimistic.

    You could bootstrap the selection process a large number of times and estimate confidence intervals for the parameters. A big potential issue is that those estimates are probably bi-modal with some percentage of models having a lot of zero values (when they were not selected).

    I think that one of the cleanest approaches is to use a Bayesian spike and slab model. You can get excellent inferences from it. It may be computationally expensive, but so is a wrapper function for feature selection.