Search code examples

How the probabilities are normalized in one-vs-rest scheme of sklearn Logistic Regression?

In the sklearn LogisticRegression classifer, we can set the muti_class option to ovr which stands for one-vs-rest, as in the following code snippet:

# logistic regression for multi-class classification using built-in one-vs-rest
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)
# define model
model = LogisticRegression(multi_class='ovr')
# fit model, y)

Now, this classifier can assign probabilities to different classes for given instances:

# make predictions
yhat = model.predict_proba(X)

The probabilities sum to 1 for each instance:

array([[0.16973178, 0.46755188, 0.36271634],
       [0.58228627, 0.0928127 , 0.32490103],
       [0.28241256, 0.51175978, 0.20582766],
       [0.17922774, 0.71300755, 0.10776471],
       [0.05888508, 0.24924809, 0.69186683],
       [0.25808835, 0.68599321, 0.05591844]])

My question: In the one-vs-rest method, a classifier is trained for each class. Therefore, we expect a probability for each class independent from other classes. How the probabilities are normalized to sum to 1?


  • As you can see here, multiclass is handled by normalizing the score of each class for the instance x over all classes as follows: the estimated probability that the instance belongs to class k is given by

    f representing the decision function, K the number of classes.