Search code examples
pythonpandaslistdata-representation

How to create binary representations of words in pandas column?


I have a column which contains lists of variable sizes. The lists contain a limited amount of short text values. Around 60 unique values all together.

0    ["AC","BB"]
1    ["AD","CB", "FF"]
2    ["AA","CC"]
3    ["CA","BB"]
4    ["AA"]

I want to make this values columns in my data-frame and the values of this columns would be 1 if the values is in this row and 0 if not.

I know I could expand the list and than call unique and set those as new columns. But after that I don't know what to do?


Solution

  • Here's one way:

    df = pd.get_dummies(df.explode('val')).sum(level = 0)
    

    NOTE: Here (level=0) is kind of like a grouping operation that uses an index for grouping stuff. So, I prefer to use this after exploding the dataframe.