python machine-learning scikit-learn logistic-regression

Optimization solver Used for One vs rest in Sickit learn

I am trying to solve a multiclass classification problem using Logistic regression. My dataset has 3 distinct classes, and each data point belongs to only one class. Here is the sample training_data;

Here the first column is vector of ones I have added as bias term. And the target column has been binarized using the concept of label binarize, as mentioned in sickit-learn

Then I got the target as follows;

array([[1, 0, 0],
   [1, 0, 0],
   [0, 1, 0],
   [1, 0, 0],
   [1, 0, 0]])

Next, I am training it using the concept of one vs. rest, i.e. training one classier at a time. Sample code;


for i in range(label_train.shape[1]):
    clf = LogisticRegression(random_state=0,multi_class='ovr', solver='liblinear',fit_intercept=True).\
 fit(train_data_copy, label_train[:,i])
    #print(clf.coef_.shape)

As you can see, I am training 3 classifiers in total, one for each label provided. I have two questions here;

First Question: As per sickit-learn documentation,

multi_class{‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’ If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

My question is since I chose solver as liblinear (being o.v.r problem), does it matter whether I select multi_class as auto or ovr.

Second question, is regarding the intercept (or bias) term. The documentation says that if fit_intercept=True then a bias term is added to the decision function. But I noticed that when I did not add vector of 1's to my data matrix, then the number of parameters in coefficient, theta vector was same as number of features, though fit_intercept=True. My question is, do we have to add vector of 1's to data matrix, as well as have fit_intercept enabled in order to have bias term added to the decision function.

Solution

It does not matter; as you might see here, either choosing multi_class='auto' or multi_class='ovr' will lead to same results whenever solver='liblinear'.
In case solver='liblinear' a default bias term equal to 1 is used and appended to X via intercept_scaling attribute (which is in turn useful only if fit_intercept=True), as you can see here. You'll have the fitted bias (dimension (n_classes,)) returned by intercept_ after fitting (zero-valued if fit_intercept=False). Fitted coefficients are returned by coef_ (dimension (n_classes, n_features) and not (n_classes, n_features + 1) - splitting done here).

Here an example, considering Iris dataset (having 3 classes and 4 features):

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)

clf = LogisticRegression(random_state=0, fit_intercept=True, multi_class='ovr', solver='liblinear')
clf.fit(X, y)
clf.intercept_, clf.coef_
################################
(array([ 0.26421853,  1.09392467, -1.21470917]),
 array([[ 0.41021713,  1.46416217, -2.26003266, -1.02103509],
        [ 0.4275087 , -1.61211605,  0.5758173 , -1.40617325],
        [-1.70751526, -1.53427768,  2.47096755,  2.55537041]]))