I am in the process of converting a binary classification problem to multi-label classification program. The code is written in python.
The below is the existing code:
positive_labels = [[0, 1] for _ in positive_examples]
negative_labels = [[1, 0] for _ in negative_examples]
Now i would like to convert this into a multi-label like 3 classes - 0,1,2
positive_labels = [[1,0,0] for _ in positive_examples]
neutral_labels = [[0,1,0] for _ in neutral_examples]
negative_labels = [[0,0,1] for _ in negative_examples]
Is this correct? If not could you please let me know how to do this?
Please help.
You could use MultiLabelBinarizer in scikit-learn for this
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
# to fit transform you pass the rows of labels
mlb.fit_transform([(0,), (1,),(1,2)])
You get a output like shown below
array([[1, 0, 0],
[0, 1, 0],
[0, 1, 1]])
fit_transform method implements the TransformerMixin (http://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html). It fits the learn and then transforms it. Once you have called fit_transform, there is no need to call fit again, you just call transform like shown below
mlb.transform([(1,2),(0,1)])
array([[0, 1, 1],
[1, 1, 0]])