I want to train a 2-way classifier, that is, assume I have 4 classes that I want to classify a text to. I don't want to group all the training data in one training set and the labels then would be 4 labels. Rather, I want to make a binary labels. For example, I have to first make 4 copies of the dataset, and then, I make label A and the rest Not A, and then the second dataset would be B and Not B and so on..
After that, I have to make 4 models(naive bayes for example) and train every dataset I made. What I want is a method to do all of that without all of this work. Is that possible?
Yes, this strategy where separate binary classifiers are fit for each of multiple classes present in a single dataset is called "one versus all" or "one versus rest". Some sklearn models come with this available as a parameter, such as logistic regression where you can set the multi_class
parameter to 'ovr'
for one v. rest.
There's a nice sklearn object that makes it easy for other algorithms called OneVersusRestClassifier. For your naive bayes example, it's as easy as:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.naive_bayes import GaussianNB
clf = OneVsRestClassifier(GaussianNB())
Then you can use your classifier as normal from there, e.g. clf.fit(X,y)
(Interestingly, a one versus all naive bayes model is not simply equivalent to multinomial naive bayes when there are three or more classes, as I had initially assumed. There's a short example here which demonstrates this.)