Search code examples
pythonpandasdataframedata-cleaning

How to remove lists in a column of a pandas data frame for converting categorical values to numerical values


I am trying to use the pd.get_dummies() function to convert categorical features to numerical, but the problem is that I have a column with lists.This is the genre column by the way.

0     ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc...

1     ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space']

2     ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D...

3     ['Action', 'Magic', 'Police', 'Supernatural', ...

4     ['Adventure', 'Fantasy', 'Shounen', 'Supernatu...

I have tried all the answers on the stackoverflow which addressed this issue. Nothing works

I want the output to be

0    'Action', 'Adventure', 'Comedy', 'Drama', 'Sc...

1    'Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space'

2    'Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D...

3    'Action', 'Magic', 'Police', 'Supernatural', ...

4    'Adventure', 'Fantasy', 'Shounen', 'Supernatu...

So that I can use the get_dummies to create the dummies. Please Help!


Solution

  • you can use explode in pandas above 0.25 as below to do that

    d = {"genre":[['Action', 'Adventure', 'Comedy', 'Drama'],  
     ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space'],  
     ['Action', 'Sci-Fi', 'Adventure', 'Comedy'],  
     ['Action', 'Magic', 'Police', 'Supernatural'],    
     ['Adventure', 'Fantasy', 'Shounen', 'Supernatu']]}
    
    df = pd.DataFrame(d)
    pd.get_dummies(df.explode("genre").pivot(columns="genre", values="genre"))