Search code examples
pythonpandasboolean-operations

Pandas: replace a single column (field) of strings with one column for each string


Say I have the following dataframe:

    Colors                  
0   red, white, blue
1   white, blue
2   blue, red
3   white
4   blue

where each unique value in column "Colors" needs to become an individual column, so that these columns can be populated with Boolean indices. Example:

    red white blue white,blue blue,red red,white,blue                    
0   0   0     0    0          0        1    
1   0   0     0    1          0        0
2   0   0     0    0          1        0
3   0   1     0    0          0        0
4   0   0     1    0          0        0

Looking for suggestions on how to deal with this


Solution

  • Use:

    df = pd.get_dummies(df['Colors'])
    print (df)
       blue  blue, red  red, white, blue  white  white, blue
    0     0          0                 1      0            0
    1     0          0                 0      0            1
    2     0          1                 0      0            0
    3     0          0                 0      1            0
    4     1          0                 0      0            0
    

    Or:

    df = df['Colors'].str.get_dummies(', ')
    print (df)
       blue  red  white
    0     1    1      1
    1     1    0      1
    2     1    1      0
    3     0    0      1
    4     1    0      0