I have a pandas data-set in which one of the column is like this:
Genre
------------
Documentary
Documentary
Comedy|Mystery|Thriller
Animation|Comedy|Family
Documentary
Documentary|Family
Action|Adventure|Fantasy|Sci-Fi
Crime|Drama|Mystery
Action|Crime|Mystery|Thriller
How can I create multiple columns with each genre name and fill 1 if its contains that genre or else 0?
Expected Output: Pandas Dataframe
Documentary Comedy Mystery Thriller Animation Family ......
1 0 0 0 0 0
1 0 0 0 0 0
0 1 1 1 0 0
so on.
I tried using first converting it to list then split it, but it's not pythonic way to do it.
Can we do it efficiently using apply
function or some other efficient techniques?
Use Series.explode + pd.get_dummies:
s_explode=df['Genre'].str.split('|').explode()
dfc=pd.get_dummies(s_explode).groupby(level=0).sum()
new_df=pd.concat([df['Genre'],dfc],axis=1)
print(new_df)
Genre Action Adventure Animation Comedy \
0 Documentary 0 0 0 0
1 Documentary 0 0 0 0
2 Comedy|Mystery|Thriller 0 0 0 1
3 Animation|Comedy|Family 0 0 1 1
4 Documentary 0 0 0 0
5 Documentary|Family 0 0 0 0
6 Action|Adventure|Fantasy|Sci-Fi 1 1 0 0
7 Crime|Drama|Mystery 0 0 0 0
8 Action|Crime|Mystery|Thriller 1 0 0 0
Crime Documentary Drama Family Fantasy Mystery Sci-Fi Thriller
0 0 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0
2 0 0 0 0 0 1 0 1
3 0 0 0 1 0 0 0 0
4 0 1 0 0 0 0 0 0
5 0 1 0 1 0 0 0 0
6 0 0 0 0 1 0 1 0
7 1 0 1 0 0 1 0 0
8 1 0 0 0 0 1 0 1