Want to convert user_Id and skills dataFrame matrix into zero one DataFrame matrix format user and their corresponding skills
Input DataFrame
user_Id skills
0 user1 "java, hdfs, hadoop"
1 user2 "python, c++, c"
2 user3 "hadoop, java, hdfs"
3 user4 "html, java, php"
4 user5 "hadoop, php, hdfs"
Desired Output DataFrame
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1
For me works str.get_dummies
+ concat
:
df1 = df['skills'].str.get_dummies(', ')
print (df1)
c c++ hadoop hdfs html java php python
0 0 0 1 1 0 1 0 0
1 1 1 0 0 0 0 0 1
2 0 0 1 1 0 1 0 0
3 0 0 0 0 1 1 1 0
4 0 0 1 1 0 0 1 0
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
EDIT:
If no space
with ,
use:
df1 = df['skills'].str.get_dummies(',')