Search code examples
pythonscikit-learnlogistic-regression

Dummy Variables in Python SKLearn Logistic Regression


I am using logisitic regression in SKLearn to classify data into one of 5 classes. To train the data I have a matrix of observations Y and a matrix of features X.

Sometimes it is the case that my matrix Y will have no category 3 say. In this case when I call the predict_proba(X) method I would like to have a list of 5 probabilities where the 3rd entry is 0 (as there are no category 3 observations). Instead this probability is simply omitted and a list of 4 probabilities is returned.

How can I change the logistic regression object to do this?


Solution

  • A multi-class label can be found using the sklearn.preprocessing module.

    Reference: http://scikit-learn.org/stable/modules/preprocessing.html#label-binarization