Say I have the following dataframe:
Colors
0 red, white, blue
1 white, blue
2 blue, red
3 white
4 blue
where each unique value in column "Colors" needs to become an individual column, so that these columns can be populated with Boolean indices. Example:
red white blue white,blue blue,red red,white,blue
0 0 0 0 0 0 1
1 0 0 0 1 0 0
2 0 0 0 0 1 0
3 0 1 0 0 0 0
4 0 0 1 0 0 0
Looking for suggestions on how to deal with this
Use:
df = pd.get_dummies(df['Colors'])
print (df)
blue blue, red red, white, blue white white, blue
0 0 0 1 0 0
1 0 0 0 0 1
2 0 1 0 0 0
3 0 0 0 1 0
4 1 0 0 0 0
Or:
df = df['Colors'].str.get_dummies(', ')
print (df)
blue red white
0 1 1 1
1 1 0 1
2 1 1 0
3 0 0 1
4 1 0 0