I am trying to predict y, a column of 0s and 1s (classification), using features (X). I'm using ML models like XGBoost.
One of my features, in reality, is highly predictive, let's call it X1. X1 is a column of -1/0/1. When X1 = 1, 80% of the time y = 1. When X1 = -1, 80% of the time y = 0. When X1 = 0, it has no correlation with y.
So in reality, ML aside, any sane person would select this in their model, because if you see X1 = 1 or X1 = -1 you have a 80% chance of predicting whether y is 0 or 1.
However X1 is only -1 or 1 about 5% of the time, and is 0 95% of the time. When I run it through feature selection techniques like Sequential Feature Selection, it doesn't get chosen! And I can understand why ML doesn't choose it, because 95% of the time it is a 0 (and thus uncorrelated with y). And so for any score that I've come across, models with X1 don't score well.
So my question is more generically, how can one deal with this paradox between ML technique and real-life logic? What can I do differently in ML feature selection/modelling to take advantage of the information embedded in the X1 -1's and 1's, which I know (in reality) are highly predictive? What feature selection technique would have spotted the predictive power of X1, if we didn't know anything about it? So far, all methods that I know of need predictive power to be unconditional. Instead, here X1 is highly predictive conditional on not being 0 (which is only 5% of the time). What methods are out there to capture this?
Many thanks for any insight!
Probably sklearn.feature_selection.RFE
would be a good option, since it is not really dependant on the feature selection method. What I mean by that, is that it recursively fits the estimator you're planning to use and smaller on smaller subsets of features, and recursively removes features with the lowest scores until a desired amount of features is reached.
This seems like a good appraoch, since regardless of whether the feature in question seems more or less of a good predictor to you, this feature selection method tells you how important the feature is to the model. So if a feature is not considered, it is not as relevant to the model in question.