Search code examples
pythonmachine-learningscikit-learnfeature-selection

How does SelectFromModel from scikit-learn select features?


When I use XGBClassifier with SelectFromModel the algorithm always returns around five features regardless of the max_features value

My question is: does XGBClassifier though that there are only five useful features in my dataset?

from sklearn.feature_selection  import SelectFromModel
from xgboost                    import XGBClassifier

sf=SelectFromModel(XGBClassifier(), max_features=10).fit(X, y)


#The output only contains five True, all remaining are False
print(sf.get_support())

Solution

  • To only select based on max_features, set threshold=-np.inf.

    I found the above text in the documentation sklearn.feature_selection. This means as priority SelectFromModel depends on the threshold parameter and returns all features that pass the threshold (regardless of max_features).

    If you want max_features fully function, then set threshold=-np.inf, in this case, all features pass the threshold, then max_features can select demanded features (based on their rank).