Search code examples
pythonpandassparse-matrix

Convert Two column data frame to occurrence matrix in pandas


Hi all I have a csv file which contains data as the format below

A   a
A   b
B   f
B   g
B   e
B   h
C   d
C   e
C   f

The first column contains items second column contains available feature from feature vector=[a,b,c,d,e,f,g,h] I want to convert this to occurence matrix look like below

    a,b,c,d,e,f,g,h
A   1,1,0,0,0,0,0,0
B   0,0,0,0,1,1,1,1
C   0,0,0,1,1,1,0,0

Can anyone tell me how to do this using pandas?


Solution

  • Here is another way to do it using pd.get_dummies().

    import pandas as pd
    
    # your data
    # =======================
    df
    
      col1 col2
    0    A    a
    1    A    b
    2    B    f
    3    B    g
    4    B    e
    5    B    h
    6    C    d
    7    C    e
    8    C    f
    
    # processing
    # ===================================
    pd.get_dummies(df.col2).groupby(df.col1).apply(max)
    
          a  b  d  e  f  g  h
    col1                     
    A     1  1  0  0  0  0  0
    B     0  0  0  1  1  1  1
    C     0  0  1  1  1  0  0