Search code examples
pythonone-hot-encodingapriori

How to One Hot Encode Dataframe for Association Rule Analysis (apriori)


I'm given a data frame of what mimics a grocery list:

import pandas as pd

data = {'Produce':  ['Brocolli', 'Spinach','Spinach','Lettuce','Brocolli','Lettuce','Lettuce',],
        'Dairy': ['Milk', '','Milk','Cheese','Milk','Yogurt','Yogurt',],
        'Beverage': ['', '','Orange Juice','Soda','Soda','Orange juice','',],
        'Fruit': ['Brocolli', 'Spinach','Spinach','Lettuce','Brocolli','Lettuce','Lettuce',],
        'Poultry': ['Chicken Tender', 'Chicken Breasts','Chicken Tender','Chicken Thigh','Chicken Breasts','','Chicken Breasts',],
        'Deli': ['Turkey Breasts', 'Ham','Ham','','','Turkey Breasts','',],
       }

df = pd.DataFrame (data, columns = ['Produce','Dairy','Beverage','Fruit','Deli'])

df

How do I perform one-hot-encoding to transform this data frame so that I can run apriori on it (basically with all distinctive values as column labels and values are replaced with booleans, as I understand)?


Solution

  • You can try:

    pd.get_dummies(df)