Search code examples
python-2.7pandasdataframesklearn-pandas

Convert this Word DataFrame into Zero One Matrix Format DataFrame in Python Pandas


Want to convert user_Id and skills dataFrame matrix into zero one DataFrame matrix format user and their corresponding skills

Input DataFrame

     user_Id                        skills

0     user1               [java, hdfs, hadoop]
1     user2               [python, c++, c]
2     user3               [hadoop, java, hdfs]
3     user4               [html, java, php]
4     user5               [hadoop, php, hdfs]

Desired Output DataFrame

user_Id       java  c   c++     hadoop  hdfs    python  html    php     

user1         1     0   0       1       1       0       0       0
user2         0     1   1       0       0       1       0       0
 user3        1     0   0       1       1       0       0       0
user4         1     0   0       0       0       0       1       1
user5         0     0   0       1       1       0       0       1

Solution

  • You can join new DataFrame created by astype if need convert lists to str (else omit), then remove [] by strip and use get_dummies:

    df = df[['user_Id']].join(df['skills'].astype(str).str.strip('[]').str.get_dummies(', '))
    print (df)
      user_Id  c  c++  hadoop  hdfs  html  java  php  python
    0   user1  0    0       1     1     0     1    0       0
    1   user2  1    1       0     0     0     0    0       1
    2   user3  0    0       1     1     0     1    0       0
    3   user4  0    0       0     0     1     1    1       0
    4   user5  0    0       1     1     0     0    1       0
    

    df1 = df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')
    #if necessary remove ' from columns names
    df1.columns = df1.columns.str.strip("'")
    df = pd.concat([df['user_Id'], df1], axis=1)
    print (df)
      user_Id  c  c++  hadoop  hdfs  html  java  php  python
    0   user1  0    0       1     1     0     1    0       0
    1   user2  1    1       0     0     0     0    0       1
    2   user3  0    0       1     1     0     1    0       0
    3   user4  0    0       0     0     1     1    1       0
    4   user5  0    0       1     1     0     0    1       0