Search code examples
python-2.7pandasdataframesklearn-pandas

Convert word Python Pandas Data Frame into Zero One Data Frame


Input

userID  col1    col2    col3    col4    col5    col6    col7    col8    col9            
1   Java    c   c++ php python  perl    html    hadoop  nodejs          
2   nodejs  c#  c++ oops    css html    angular java    php         
3   php python  html    java    angular hadoop  c   nodejs  c#          
4   python  php css perl    hadoop  c   nodejs  c#  html            
5   perl    css python  hadoop  c   nodejs  c#  java    php         
6   Java    python  css     perl    nodejs  c#  java    php hadoop          
7   javascript  java    perl    nodejs  angular php mysql   hadoop  html            
8   angular mysql   mongodb cs  hadoop  angular oops    html    perl            
9   nodejs  hadoop  mysql   mongodb angular oops    html    python  java

Desire Output

userID  Java    C   C++ php python  perl    html    hadoop  nodejs  oops    mysql   mongo
1   1   1   1   1   1   1   1   1   1   0   0   0
2   1   0   1   1   0   0   1   0   1   0   0   0
3   1   1   0   1   1   1   1   1   1   0   0   0
4   0   0   0   0   1   1   1   0   1   1   1   1

Solution

  • Use get_dummies + groupby by column names and aggregate max:

    df = pd.get_dummies(df.set_index('userID'), prefix='', prefix_sep='')
    df = df.groupby(level=0, axis=1).max().reset_index()
    print (df)
       userID  Java  angular  c  c#  c++  cs  css  hadoop  html  java  javascript  \
    0       1     1        0  1   0    1   0    0       1     1     0           0   
    1       2     0        1  0   1    1   0    1       0     1     1           0   
    2       3     0        1  1   1    0   0    0       1     1     1           0   
    3       4     0        0  1   1    0   0    1       1     1     0           0   
    4       5     0        0  1   1    0   0    1       1     0     1           0   
    5       6     1        0  0   1    0   0    1       1     0     1           0   
    6       7     0        1  0   0    0   0    0       1     1     1           1   
    7       8     0        1  0   0    0   1    0       1     1     0           0   
    8       9     0        1  0   0    0   0    0       1     1     1           0   
    
       mongodb  mysql  nodejs  oops  perl  php  python  
    0        0      0       1     0     1    1       1  
    1        0      0       1     1     0    1       0  
    2        0      0       1     0     0    1       1  
    3        0      0       1     0     1    1       1  
    4        0      0       1     0     1    1       1  
    5        0      0       1     0     1    1       1  
    6        0      1       1     0     1    1       0  
    7        1      1       0     1     1    0       0  
    8        1      1       1     1     0    0       1