python scikit-learn classification text-classification multilabel-classification

Mutli-class classification in python

I am in the process of converting a binary classification problem to multi-label classification program. The code is written in python.

The below is the existing code:

positive_labels = [[0, 1] for _ in positive_examples]
negative_labels = [[1, 0] for _ in negative_examples]

Now i would like to convert this into a multi-label like 3 classes - 0,1,2

positive_labels = [[1,0,0] for _ in positive_examples]
neutral_labels = [[0,1,0] for _ in neutral_examples]
negative_labels = [[0,0,1] for _ in negative_examples]

Is this correct? If not could you please let me know how to do this?

Please help.

Solution

You could use MultiLabelBinarizer in scikit-learn for this

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
# to fit transform you pass the rows of labels
mlb.fit_transform([(0,), (1,),(1,2)])

You get a output like shown below

array([[1, 0, 0],
       [0, 1, 0],
       [0, 1, 1]])

fit_transform method implements the TransformerMixin (http://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html). It fits the learn and then transforms it. Once you have called fit_transform, there is no need to call fit again, you just call transform like shown below

mlb.transform([(1,2),(0,1)]) 

array([[0, 1, 1],
       [1, 1, 0]])