Search code examples
pythonscikit-learnclassificationtext-classificationmultilabel-classification

Mutli-class classification in python


I am in the process of converting a binary classification problem to multi-label classification program. The code is written in python.

The below is the existing code:

positive_labels = [[0, 1] for _ in positive_examples]
negative_labels = [[1, 0] for _ in negative_examples]

Now i would like to convert this into a multi-label like 3 classes - 0,1,2

positive_labels = [[1,0,0] for _ in positive_examples]
neutral_labels = [[0,1,0] for _ in neutral_examples]
negative_labels = [[0,0,1] for _ in negative_examples]

Is this correct? If not could you please let me know how to do this?

Please help.


Solution

  • You could use MultiLabelBinarizer in scikit-learn for this

    from sklearn.preprocessing import MultiLabelBinarizer
    mlb = MultiLabelBinarizer()
    # to fit transform you pass the rows of labels
    mlb.fit_transform([(0,), (1,),(1,2)])
    

    You get a output like shown below

    array([[1, 0, 0],
           [0, 1, 0],
           [0, 1, 1]])
    

    fit_transform method implements the TransformerMixin (http://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html). It fits the learn and then transforms it. Once you have called fit_transform, there is no need to call fit again, you just call transform like shown below

    mlb.transform([(1,2),(0,1)]) 
    
    array([[0, 1, 1],
           [1, 1, 0]])