scikit-learn svm multiclass-classification

Can the parameters of a Classifier be different for multiple classes in OneVsRestClassifier

Does any one knows if sklearn supports different parameters for the various classifiers inside a OneVsRestClassifier ? For instance in that exemple, I would like to have different values of C for the different classes.

from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
text_clf = OneVsRestClassifier(LinearSVC(C=1.0, class_weight="balanced"))

Solution

No OneVsRestClassifier doesnt currently different parameter of estimators or different estimators for different classes currently.

There are some implemented in other things like LogisticRegressionCV which will automatically tune different values of parameters according to classes, but its not extended yet for OneVsRestClassifier yet.

But if you want that, we can do the change in the source to implement that.

Current source of fit() in the master branch is this:

    ... 
    ...
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
        self.estimator, X, column, classes=[
            "not %s" % self.label_binarizer_.classes_[i],
            self.label_binarizer_.classes_[i]])
        for i, column in enumerate(columns))

As you can see, same estimator (self.estimator) is being passed to all classes to be trained. So we will make a new version of OneVsRestClassifier to change this:

from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import LabelBinarizer
from sklearn.externals.joblib import Parallel, delayed
from sklearn.multiclass import _fit_binary

class CustomOneVsRestClassifier(OneVsRestClassifier):

    # Changed the estimator to estimators which can take a list now
    def __init__(self, estimators, n_jobs=1):
        self.estimators = estimators
        self.n_jobs = n_jobs

    def fit(self, X, y):

        self.label_binarizer_ = LabelBinarizer(sparse_output=True)
        Y = self.label_binarizer_.fit_transform(y)
        Y = Y.tocsc()
        self.classes_ = self.label_binarizer_.classes_
        columns = (col.toarray().ravel() for col in Y.T)

        # This is where we change the training method
        self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
            estimator, X, column, classes=[
                "not %s" % self.label_binarizer_.classes_[i],
                self.label_binarizer_.classes_[i]])
            for i, (column, estimator) in enumerate(zip(columns, self.estimators)))
        return self

And now you can use it.

# Make sure you add those many estimators as there are classes
# In binary case, only a single estimator should be used
estimators = []

# I am considering 3 classes as of now
estimators.append(LinearSVC(C=1.0, class_weight="balanced"))
estimators.append(LinearSVC(C=0.1, class_weight="balanced"))
estimators.append(LinearSVC(C=10, class_weight="balanced"))
clf = CustomOneVsRestClassifier(estimators)

clf.fit(X, y)

Note: I haven't yet implemented partial_fit() in it yet. If you intend to use that we can work on it.