Search code examples
pythonclassificationtext-classificationnaivebayesmulticlass-classification

Is there a fast way to train many models at the same time?


I want to train a 2-way classifier, that is, assume I have 4 classes that I want to classify a text to. I don't want to group all the training data in one training set and the labels then would be 4 labels. Rather, I want to make a binary labels. For example, I have to first make 4 copies of the dataset, and then, I make label A and the rest Not A, and then the second dataset would be B and Not B and so on..

After that, I have to make 4 models(naive bayes for example) and train every dataset I made. What I want is a method to do all of that without all of this work. Is that possible?


Solution

  • Yes, this strategy where separate binary classifiers are fit for each of multiple classes present in a single dataset is called "one versus all" or "one versus rest". Some sklearn models come with this available as a parameter, such as logistic regression where you can set the multi_class parameter to 'ovr' for one v. rest.

    There's a nice sklearn object that makes it easy for other algorithms called OneVersusRestClassifier. For your naive bayes example, it's as easy as:

    from sklearn.multiclass import OneVsRestClassifier
    from sklearn.naive_bayes import GaussianNB
    
    clf = OneVsRestClassifier(GaussianNB())
    

    Then you can use your classifier as normal from there, e.g. clf.fit(X,y)

    (Interestingly, a one versus all naive bayes model is not simply equivalent to multinomial naive bayes when there are three or more classes, as I had initially assumed. There's a short example here which demonstrates this.)