Let's say I have a pandas dataframe that looks like the following:
car colors
corvette {"colors": ["red", "black"]}
forester {"colors": ["white", "silver", "black"]}
I'd like to one hot encode the colors of each car like so:
car black red white silver black
corvette 1 1 0 0 0
forester 1 0 1 1 0
What's a nice elegant way to accomplish this?
Try this:
(df.drop('colors', axis=1)
.join(pd.get_dummies(pd.DataFrame.from_records(df.colors.values)
['colors'].explode())
.sum(level=0)
)
)
Output:
car black red silver white
0 corvette 1 1 0 0
1 forester 1 0 1 1