Search code examples
pythonpython-3.xscikit-learnsvcgridsearchcv

One-class-only folds tested through GridSearchCV


When using GridSearchCV on a custom estimator that is a wrapper on SVC, I get the error: "ValueError: The number of classes has to be greater than one; got 1 class"

The custom estimator is made to add gridsearch parameters to the estimator and seemed to work fine.

Using the debugger, I found that indeed, a one-class-only train set is given to my estimator, so 2 possibilities arise:

  • Either the estimator should handle one-class-only set

  • Either the GridSearchCV should not give one-class-only set

As I get an error from the SVC.fit call and that it seems that SVC should not receive one-class-only sets, I think it is the 2nd option. However I've looked in the GridSearchCV implementation but I didn't find anywhere where it checks whether there is a one-class-only fold or why it would fail ...

I used the grid search inside a cross validation to do a nested cross validation:

gs = GridSearchCV(clf.gs_clf.get_gs_clf(), parameter_grid, cv=n_inner_splits, iid=False)
gs.fit(*clf.get_train_set(X, y, train_index))


Solution

  • I found the real issue, the documentation of GridsearchCV specifies for the parameter cv:

    # For integer/None inputs, if the estimator is a classifier and ``y`` is
    # either binary or multiclass, `StratifiedKFold` is used. In all
    # other cases, `KFold` is used.
    

    And one-class-only subsets are not possible for StratifiedKFold.

    So the solution was for my custom estimator to inherit from the sklearn.base.ClassifierMixin