This is my data
No Text
1 You are smart
2 You are beautiful
My expected output
No Text You are smart beautiful
1 You are smart 1 1 1 0
2 You are beautiful 1 1 0 1
For nltk
solution need word_tokenize
for list of words, then MultiLabelBinarizer
and last join
to original:
from sklearn.preprocessing import MultiLabelBinarizer
from nltk import word_tokenize
mlb = MultiLabelBinarizer()
s = df.apply(lambda row: word_tokenize(row['Text']), axis=1)
df = df.join(pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_, index=df.index))
print (df)
No Text You are beautiful smart
0 1 You are smart 1 1 0 1
1 2 You are beautiful 1 1 1 0
For pure pandas
use get_dummies
+ join
df = df.join(df['Text'].str.get_dummies(sep=' '))