Search code examples
pythonpandasdataframewordnet

Adding entries to a column in a pandas DataFrame row-by-row using a dictionary


I am attempting to build a word cloud of cuisine types and wanted to include synonyms of a cuisine into its counter as a dictionary where the key is the cuisine and the values are a list of its synonyms. For example:

> 'Desserts': {'afters', 'sweet', 'dessert'}

The DataFrame I'm working with has thousands of rows but a something very similar can be generated using this (Note: The synonyms column was added for this exercise):

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'primary_cuisine':['indian','desserts','chinese','american','asian','turkish'],
    'synonyms':['', '', '', '', '', '']
    })

It generates this sample:

  primary_cuisine synonyms
0       fast food         
1        desserts         
2         chinese         
3        american         
4           asian         
5         turkish         

I've generated the list of synonyms for each cuisine as follows:

word = ''
syn_dict = {}
for cuisine in df['primary_cuisine']:
    synonyms = []
    word = cuisine
    # Store synonyms in a dictionary
    for syn in wn.synsets(word):
        for lm in syn.lemmas():
            synonyms.append(lm.name())
    # Adding values to a key as a set to remove duplicates
    if (len(synonyms) > 0):
        syn_dict[word] = set(synonyms)
    else:
        syn_dict[word] = {}

Here's where I'm stuck, how would I write the synonyms column of the DataFrame for each key using the values within the dictionary. Any help would be appreciated and any suggestions for a better/easier way to do what I'm trying to accomplish would be great too! Here is what I'm hoping to achieve (for the above sample) if it helps:

  primary_cuisine synonyms
0       fast food 
1        desserts afters, sweet, dessert
2         chinese Chinese, Taiwanese, Formosan
3        american American_language, American_English, American,
4           asian Asian, Asiatic
5         turkish Turkish

Solution

  • You can use .map() and a dict like so:

    dct = {'Desserts': ['afters', 'sweet', 'dessert'}}
    
    df['synonyms'] = df['primary_cuisine'].map(dct)