python classification text-classification naivebayes multiclass-classification

Is there a fast way to train many models at the same time?

I want to train a 2-way classifier, that is, assume I have 4 classes that I want to classify a text to. I don't want to group all the training data in one training set and the labels then would be 4 labels. Rather, I want to make a binary labels. For example, I have to first make 4 copies of the dataset, and then, I make label A and the rest Not A, and then the second dataset would be B and Not B and so on..

After that, I have to make 4 models(naive bayes for example) and train every dataset I made. What I want is a method to do all of that without all of this work. Is that possible?

Solution

Yes, this strategy where separate binary classifiers are fit for each of multiple classes present in a single dataset is called "one versus all" or "one versus rest". Some sklearn models come with this available as a parameter, such as logistic regression where you can set the multi_class parameter to 'ovr' for one v. rest.

There's a nice sklearn object that makes it easy for other algorithms called OneVersusRestClassifier. For your naive bayes example, it's as easy as:

from sklearn.multiclass import OneVsRestClassifier
from sklearn.naive_bayes import GaussianNB

clf = OneVsRestClassifier(GaussianNB())

Then you can use your classifier as normal from there, e.g. clf.fit(X,y)

(Interestingly, a one versus all naive bayes model is not simply equivalent to multinomial naive bayes when there are three or more classes, as I had initially assumed. There's a short example here which demonstrates this.)