Search code examples
python-3.xpandasfeature-extraction

Feature engineering using Python


I have a pandas data-set in which one of the column is like this:

         Genre
        ------------
         Documentary
         Documentary
         Comedy|Mystery|Thriller
         Animation|Comedy|Family
         Documentary
         Documentary|Family
         Action|Adventure|Fantasy|Sci-Fi
         Crime|Drama|Mystery
         Action|Crime|Mystery|Thriller

How can I create multiple columns with each genre name and fill 1 if its contains that genre or else 0?

Expected Output: Pandas Dataframe

  Documentary  Comedy  Mystery  Thriller  Animation  Family  ......
    1           0       0          0        0          0
    1            0       0          0        0          0
    0            1        1         1        0          0

so on.

I tried using first converting it to list then split it, but it's not pythonic way to do it.

Can we do it efficiently using apply function or some other efficient techniques?


Solution

  • Use Series.explode + pd.get_dummies:

    s_explode=df['Genre'].str.split('|').explode()
    dfc=pd.get_dummies(s_explode).groupby(level=0).sum()
    new_df=pd.concat([df['Genre'],dfc],axis=1)
    print(new_df)
    

                                  Genre  Action  Adventure  Animation  Comedy  \
    0                      Documentary       0          0          0       0   
    1                      Documentary       0          0          0       0   
    2          Comedy|Mystery|Thriller       0          0          0       1   
    3          Animation|Comedy|Family       0          0          1       1   
    4                      Documentary       0          0          0       0   
    5               Documentary|Family       0          0          0       0   
    6  Action|Adventure|Fantasy|Sci-Fi       1          1          0       0   
    7              Crime|Drama|Mystery       0          0          0       0   
    8    Action|Crime|Mystery|Thriller       1          0          0       0   
    
       Crime  Documentary  Drama  Family  Fantasy  Mystery  Sci-Fi  Thriller  
    0      0            1      0       0        0        0       0         0  
    1      0            1      0       0        0        0       0         0  
    2      0            0      0       0        0        1       0         1  
    3      0            0      0       1        0        0       0         0  
    4      0            1      0       0        0        0       0         0  
    5      0            1      0       1        0        0       0         0  
    6      0            0      0       0        1        0       1         0  
    7      1            0      1       0        0        1       0         0  
    8      1            0      0       0        0        1       0         1