In Pandas dataframe, one of the column am having is series datatype i.e food_column, from this column I have to extract the output column
Input : food_column
[ 'bread','bread','bread'] ,
[ 'meat','butter','butter'] ,
[ 'meat', 'butter','bread','meat']
['butter']
['bread','meat','bread','meat']
Output : main_column
['bread']
['butter']
['meat']
['butter']
['bread']
condition:
import pandas as pd
import random
from collections import Counter
import numpy as np
food_list = [[ 'bread','bread','bread'] ,
['meat','butter','butter'] ,
['meat', 'butter','bread','meat'],
['butter'],
['bread','meat','bread','meat'],
['']]
food_series = pd.Series(food_list)
df = pd.DataFrame({'food_column': food_series})
# randomize list item order, since dict item order is constant in Python 3.6+
df['random_food_list'] = [random.sample(z, len(z)) for z in df['food_column'].to_list()]
# get counts
df['food_counts'] = df['random_food_list'].apply(lambda x: Counter(x))
# get key with max value
df['main_column'] = df['food_counts'].apply(lambda x: max(x, key=x.get))
# replace empty strings with 'unknown'
df['main_column'] = np.where(df['main_column'] == '', 'unknown', df['main_column'])