I am useing recursive feature elimination and cross-validated (rfecv) in order to find the best accuracy score for features. As I see _grid_scoresis the score the estimator produced when trained with the i-th subset of features. Is there any way to get the index of subset features for each score in the _grid_score? I can get the index of the selected features for highest score using get_support ( 5 subset of features).
subset_features, scores
5 , 0.976251
4 , 0.9762072
3 , 0.97322212
How can I get the indexes of 4 or 3 subset of features? I checked the output of rfecv.ranking_ and the 5 features have rank =1 , but the Rank= 2 only has one feature and so on.
A (single) subset of 3 (or 4) features was (probably) never chosen!
This seems to be a common misconception on how RFECV
works; see How does cross-validated recursive feature elimination drop features in each iteration (sklearn RFECV)?. There's an RFE
for each cross-validation fold (say 5), and each will produce its own set of 3 features (probably different). Unfortunately (in this case at least), those RFE
objects are not saved, so you cannot identify which sets of features each fold has selected; only the score is saved (source pt1, pt2) for choosing the optimal number of features, and then another RFE
is trained on the entire dataset to reduce to the final set of features.