Search code examples
machine-learningscikit-learndata-sciencefeature-extractionfeature-selection

How can I get the index of features for each score of rfecv?


I am useing recursive feature elimination and cross-validated (rfecv) in order to find the best accuracy score for features. As I see _grid_scoresis the score the estimator produced when trained with the i-th subset of features. Is there any way to get the index of subset features for each score in the _grid_score? I can get the index of the selected features for highest score using get_support ( 5 subset of features).

subset_features, scores

5 , 0.976251

4 , 0.9762072

3 , 0.97322212

How can I get the indexes of 4 or 3 subset of features? I checked the output of rfecv.ranking_ and the 5 features have rank =1 , but the Rank= 2 only has one feature and so on.


Solution

  • A (single) subset of 3 (or 4) features was (probably) never chosen!

    This seems to be a common misconception on how RFECV works; see How does cross-validated recursive feature elimination drop features in each iteration (sklearn RFECV)?. There's an RFE for each cross-validation fold (say 5), and each will produce its own set of 3 features (probably different). Unfortunately (in this case at least), those RFE objects are not saved, so you cannot identify which sets of features each fold has selected; only the score is saved (source pt1, pt2) for choosing the optimal number of features, and then another RFE is trained on the entire dataset to reduce to the final set of features.