Search code examples
pythonpandasnltkwordnet

NLTK and Pandas - adding synsets into a list


I wanted to great a list that is added as new row to a dataframe.

import nltk
import pandas as pd
from nltk.corpus import wordnet
import pandas as pd
import numpy as np


Overviewdataframe = pd.DataFrame([]) 
synonyms = []

for syn in wordnet.synsets("active"):
    for l in syn.lemmas():
            synonyms.append(l.name())  
            Overviewdataframe = Overviewdataframe.append(synonyms)
            synonyms = []

Instead the row is added as column. Can you help me please!

Thank you.


Solution

  • TL;DR

    from itertools import chain
    
    import pandas as pd
    from nltk.corpus import wordnet as wn
    
    wordlist = ['active', 'fan', 'hop', 'grace']
    
    words2lemmanames = [{'word': word, 'synset':ss.name(), 'lemma_names':ss.lemma_names()}
                        for word in wordlist for ss in wn.synsets(word)]
    pd.DataFrame(words2lemmanames)
    

    In Long

    When querying the WordNet interface in NLTK, querying a word returns a "concept" also known as "synset"

    >>> wn.synsets('active')
    
    [Synset('active_agent.n.01'), Synset('active_voice.n.01'), Synset('active.n.03'), Synset('active.a.01'), Synset('active.s.02'), Synset('active.a.03'), Synset('active.s.04'), Synset('active.a.05'), Synset('active.a.06'), Synset('active.a.07'), Synset('active.s.08'), Synset('active.a.09'), Synset('active.a.10'), Synset('active.a.11'), Synset('active.a.12'), Synset('active.a.13'), Synset('active.a.14')]
    

    Each synset has its own list of lemma names, i.e.

    >>> wn.synsets('active')[0].lemma_names()
    ['active_agent', 'active']
    

    You can also access the synset directly with their "name", usual convention for the "name" is the (i) first lemma name then dot (ii) the POS tag and dot (ii) the index number.

    >>> wn.synsets('active')[0] == wn.synset('active_agent.n.01')
    True
    

    Finally, given a list of key-value pairs (i.e. dictionary object), you can feed it into a pandas.DataFrame to convert it into a dataframe.