I am trying to train a OneVsAll multiclass logistic regression model using sklearn.linear_model.LogisticRegression(multiclass='ovr')
. My dataset has over 1000 classes and 2 million training examples.
From what I understood was that this method will train 1000 different classifiers, one for each class. While doing so, the set of positive examples for each class is easy to identify. But what is the set of negative examples for each classifier? Is the set of negative examples = all the other data points in my entire training data? Won't this create an imbalance problem and reduce the effectiveness of each individual classifier?
Is the set of negative examples = all the other data points in my entire training data?
Yes.
Won't this create an imbalance problem and reduce the effectiveness of each individual classifier?
Yes, according to Bishop, Christopher M. (2006). "Pattern Recognition and Machine Learning". Springer, p. 338
it is one of the problems with this heuristic. If this seriously degrades performance in your particular case, you can consider other strategies.