python machine-learning scikit-learn feature-selection

Determine what features to drop / select using GridSearch in scikit-learn

How does one determine what features/columns/attributes to drop using GridSearch results?

In other words, if GridSearch returns that max_features should be 3, can we determine which EXACT 3 features should one use?

Let's take the classic Iris data set with 4 features.

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import StratifiedKFold 
from sklearn.model_selection import GridSearchCV
from sklearn import datasets

iris = datasets.load_iris()
all_inputs = iris.data
all_labels = iris.target

decision_tree_classifier = DecisionTreeClassifier()

parameter_grid = {'max_depth': [1, 2, 3, 4, 5],
              'max_features': [1, 2, 3, 4]}

cross_validation = StratifiedKFold(n_splits=10)

grid_search = GridSearchCV(decision_tree_classifier,
                       param_grid=parameter_grid,
                       cv=cross_validation)

grid_search.fit(all_inputs, all_labels)
print('Best score: {}'.format(grid_search.best_score_))
print('Best parameters: {}'.format(grid_search.best_params_))

Let's say we get that max_features is 3. How do I find out which 3 features were the most appropriate here?

Putting in max_features = 3 will work for fitting, but I want to know which attributes were the right ones.

Do I have to generate the possible list of all feature combinations myself to feed GridSearch or is there an easier way ?

Solution

If you use an estimator that has the attribute feature_importances_ you can simply do:

feature_importances = grid_search.best_estimator_.feature_importances_

This will return a list (n_features) of how important each feature was for the best estimator found with grid search. Additionally, if you want to use let's say a linear classifier (logistic regression), that doesn't have the attribute feature_importances_ what you could do is:

# Get the best estimator's coefficients
estimator_coeff = grid_search.best_estimator_.coef_
# Multiply the model coefficients by the standard deviation of the data
coeff_magnitude = np.std(all_inputs, 0) * estimator_coeff)

which is also an indication of the feature importance. If a model's coefficient is >> 0 or << 0, that means, in layman's terms, that the model is trying hard to capture the signal present in that feature.