How does one determine what features/columns/attributes to drop using GridSearch results?
In other words, if GridSearch returns that max_features should be 3, can we determine which EXACT 3 features should one use?
Let's take the classic Iris data set with 4 features.
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn import datasets
iris = datasets.load_iris()
all_inputs = iris.data
all_labels = iris.target
decision_tree_classifier = DecisionTreeClassifier()
parameter_grid = {'max_depth': [1, 2, 3, 4, 5],
'max_features': [1, 2, 3, 4]}
cross_validation = StratifiedKFold(n_splits=10)
grid_search = GridSearchCV(decision_tree_classifier,
param_grid=parameter_grid,
cv=cross_validation)
grid_search.fit(all_inputs, all_labels)
print('Best score: {}'.format(grid_search.best_score_))
print('Best parameters: {}'.format(grid_search.best_params_))
Let's say we get that max_features is 3. How do I find out which 3 features were the most appropriate here?
Putting in max_features = 3 will work for fitting, but I want to know which attributes were the right ones.
Do I have to generate the possible list of all feature combinations myself to feed GridSearch or is there an easier way ?
If you use an estimator that has the attribute feature_importances_
you can simply do:
feature_importances = grid_search.best_estimator_.feature_importances_
This will return a list (n_features)
of how important each feature was for the best estimator found with grid search. Additionally, if you want to use let's say a linear classifier (logistic regression), that doesn't have the attribute feature_importances_
what you could do is:
# Get the best estimator's coefficients
estimator_coeff = grid_search.best_estimator_.coef_
# Multiply the model coefficients by the standard deviation of the data
coeff_magnitude = np.std(all_inputs, 0) * estimator_coeff)
which is also an indication of the feature importance. If a model's coefficient is >> 0
or << 0
, that means, in layman's terms, that the model is trying hard to capture the signal present in that feature.