Search code examples
pythonmachine-learningscikit-learnfeature-selectionmlxtend

sklearn: ValueError: multiclass format is not supported


Answers to similar question exist, none worked to me, so I am posting this.

Using the mlxtend package to do a sequential forward feature selection. I am working on a multiclass (5 class) problem, and a random forest estimator.

from sklearn.ensemble import RandomForestClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS 

# initialise model
model = RandomForestClassifier(n_jobs=-1, verbose=0)

# initialise SFS object
sffs = SFS(model, k_features = "best",
           forward = True, floating = True, n_jobs=-1,
           verbose = 2, scoring= "roc_auc", cv=5 )

sffs.fit(X, y)

Error:

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
packages/sklearn/metrics/_scorer.py", line 106, in __call__
    score = scorer._score(cached_call, estimator, *args, **kwargs)
  File "~/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 352, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Package versions:

>>> import sklearn, mlxtend

>>> print(sklearn.__version__)
1.0.2
>>> print(mlxtend.__version__)
0.22.0

Solution

  • The traditional ROC-AUC was designed as a classification metric for binary classification, it is not defined for multiclass classification (as the error states).

    Instead, you can tranform your multiclass classification to binary with this strategy: Turn it into one-vs-rest. This makes it binary: Is it the correct class, or is it any other? To do so, you can use scoring= "roc_auc_ovr":

    from sklearn.datasets import load_iris
    from sklearn.ensemble import RandomForestClassifier
    from mlxtend.feature_selection import SequentialFeatureSelector as SFS 
    
    # Load dataset
    iris = load_iris()
    X = iris.data
    y = iris.target
    
    model = RandomForestClassifier(n_jobs=-1, verbose=0)
    
    sffs = SFS(model, 
               k_features = "best",
               forward = True, 
               floating = True, 
               n_jobs=-1,
               verbose = 2, 
               scoring= "roc_auc_ovr", 
               cv=5 )
    
    sffs.fit(X, y)