I am using sequential feature selection (sfs) from mlxtend for running step forward feature selection.
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(RandomForestClassifier(n_estimators=100, random_state=0, n_jobs = -1),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "r2",
cv = 4,
n_jobs = -1
).fit(x_train, y_train)
The code runs, but returns the scoring value as NaN.
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 0.1s finished
[2021-12-30 14:15:17] Features: 1/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 27 out of 27 | elapsed: 0.0s finished
[2021-12-30 14:15:17] Features: 2/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 26 out of 26 | elapsed: 0.0s finished
If you are doing classification, you should not be using r2
for scoring. You can refer to the scikit learn help page for a list of metrics for classification or regression.
You also should specify that you are using SequentialFeatureSelector
from mlxtend
.
Below I used accuracy and it works:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
x, y = make_classification(n_features=50,n_informative=28)
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)\
sfs = SFS(
RandomForestClassifier(),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "accuracy").fit(x_train, y_train)