pandas dataframe scikit-learn sklearn-pandas multilabel-classification

Multilabel Encoder takes whole value instead of array

I'm working on a dataset with a Tags column extracted from a stackoverflow dataset. I need to encode these tags to perform a tag prediction using a title and body.

I'm stuck with this encoding, can't get what I need.

Here's a preview of my column :

Tags
['python', 'authentication', 'login', 'flask', 'python-2.x']
['c++', 'vector', 'c++11', 'move', 'deque']
...

And what I'm doing so far :

    y_classes = pd.get_dummies(df.Tags)
    y_classes

['.net', 'asp.net-mvc', 'visual-studio', 'asp.net-mvc-4', 'intellisense']	['.net', 'asp.net-mvc-3', 'linq', 'entity-framework', 'entity-framework-5']
0	0	0
0	0	0
0	0	0

As you can see, I need to have one column for each tag and not for each unique array of tags. I tried multiple solutions found in StackOverflow but none worked

EDIT : I also tried with MultiLabelBinarizer from sklearn.preprocessing and I had a column for each unique character of Tags column

How can I make this works ?

Solution

Ok, so I figured out myself how to fix this problem so here is my solution if :

    tags_array=df['Tags'].to_numpy()
    df2 = pd.DataFrame(tags_array, columns=['Tags'])

    coun_vect = CountVectorizer()
    count_matrix  = coun_vect.fit_transform(df2["Tags"])
    count_array = count_matrix.toarray()

    df2 = pd.DataFrame(data=count_array,columns = 
    coun_vect.get_feature_names())
    print(df2)

output :

ajax	algorithm	amazon	android	angular	...
0	0	0	1	0	...
1	1	0	0	0	...
0	0	1	0	1	...
...	...	...	...	...	...

Edit :

Like @OllieStanley said in a comment, it could have worked with multilabelBinarizer, the problem was the dataset considered as a list and could be solved by using set or nested list instead