Search code examples
scikit-learnsvmmulticlass-classification

Can the parameters of a Classifier be different for multiple classes in OneVsRestClassifier


Does any one knows if sklearn supports different parameters for the various classifiers inside a OneVsRestClassifier ? For instance in that exemple, I would like to have different values of C for the different classes.

from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
text_clf = OneVsRestClassifier(LinearSVC(C=1.0, class_weight="balanced"))

Solution

  • No OneVsRestClassifier doesnt currently different parameter of estimators or different estimators for different classes currently.

    There are some implemented in other things like LogisticRegressionCV which will automatically tune different values of parameters according to classes, but its not extended yet for OneVsRestClassifier yet.

    But if you want that, we can do the change in the source to implement that.

    Current source of fit() in the master branch is this:

        ... 
        ...
        self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
            self.estimator, X, column, classes=[
                "not %s" % self.label_binarizer_.classes_[i],
                self.label_binarizer_.classes_[i]])
            for i, column in enumerate(columns))
    

    As you can see, same estimator (self.estimator) is being passed to all classes to be trained. So we will make a new version of OneVsRestClassifier to change this:

    from sklearn.multiclass import OneVsRestClassifier
    from sklearn.preprocessing import LabelBinarizer
    from sklearn.externals.joblib import Parallel, delayed
    from sklearn.multiclass import _fit_binary
    
    class CustomOneVsRestClassifier(OneVsRestClassifier):
    
        # Changed the estimator to estimators which can take a list now
        def __init__(self, estimators, n_jobs=1):
            self.estimators = estimators
            self.n_jobs = n_jobs
    
        def fit(self, X, y):
    
            self.label_binarizer_ = LabelBinarizer(sparse_output=True)
            Y = self.label_binarizer_.fit_transform(y)
            Y = Y.tocsc()
            self.classes_ = self.label_binarizer_.classes_
            columns = (col.toarray().ravel() for col in Y.T)
    
            # This is where we change the training method
            self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
                estimator, X, column, classes=[
                    "not %s" % self.label_binarizer_.classes_[i],
                    self.label_binarizer_.classes_[i]])
                for i, (column, estimator) in enumerate(zip(columns, self.estimators)))
            return self
    

    And now you can use it.

    # Make sure you add those many estimators as there are classes
    # In binary case, only a single estimator should be used
    estimators = []
    
    # I am considering 3 classes as of now
    estimators.append(LinearSVC(C=1.0, class_weight="balanced"))
    estimators.append(LinearSVC(C=0.1, class_weight="balanced"))
    estimators.append(LinearSVC(C=10, class_weight="balanced"))
    clf = CustomOneVsRestClassifier(estimators)
    
    clf.fit(X, y)
    

    Note: I haven't yet implemented partial_fit() in it yet. If you intend to use that we can work on it.