Search code examples
pythonscikit-learnfeature-selectionsequential

Solution for "nan" for score in step forward selection using python


I am using sequential feature selection (sfs) from mlxtend for running step forward feature selection.

x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(RandomForestClassifier(n_estimators=100, random_state=0, n_jobs = -1),
         k_features = 28,
          forward = True,
          floating = False,
          verbose= 2,
          scoring= "r2",
          cv = 4,
          n_jobs = -1
         ).fit(x_train, y_train)

The code runs, but returns the scoring value as NaN.

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    0.1s finished

[2021-12-30 14:15:17] Features: 1/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    0.0s finished

[2021-12-30 14:15:17] Features: 2/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 out of  26 | elapsed:    0.0s finished

Solution

  • If you are doing classification, you should not be using r2 for scoring. You can refer to the scikit learn help page for a list of metrics for classification or regression.

    You also should specify that you are using SequentialFeatureSelector from mlxtend .

    Below I used accuracy and it works:

    from mlxtend.feature_selection import SequentialFeatureSelector as SFS 
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.datasets import make_classification
    
    x, y = make_classification(n_features=50,n_informative=28)
    
    x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
    y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)\
    
    sfs = SFS(
    RandomForestClassifier(),
    k_features = 28,
    forward = True,
    floating = False,
    verbose= 2,
    scoring= "accuracy").fit(x_train, y_train)