When I use XGBClassifier
with SelectFromModel
the algorithm always returns around five features regardless of the max_features
value
My question is: does XGBClassifier
though that there are only five useful features in my dataset?
from sklearn.feature_selection import SelectFromModel
from xgboost import XGBClassifier
sf=SelectFromModel(XGBClassifier(), max_features=10).fit(X, y)
#The output only contains five True, all remaining are False
print(sf.get_support())
To only select based on max_features, set threshold=-np.inf.
I found the above text in the documentation sklearn.feature_selection. This means as priority SelectFromModel
depends on the threshold
parameter and returns all features that pass the threshold (regardless of max_features
).
If you want max_features
fully function, then set threshold=-np.inf
, in this case, all features pass the threshold, then max_features
can select demanded features (based on their rank).