Search code examples
pythonloopswordnet

Synset function to include synonym in a list


I need to iterate over a list and add synonyms and hyponyms of the words back to the list. For example:

list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms

I'm able to get the synonyms and hypnonyms for individual words, but need to iterate over a list of values.

s = wordnet.synset(word)[0]

needs to return a list with individual synonyms added to the original list.

Expected result is: list_of_words = ["bird", "smart", "cool", "happy", "hen", "cock"..other synonyms of bird, "clever", "intelligent", other synonyms of smart....and so on]

How can I get the synset function to iterate over the list_of_words and include these words in the list? I'm very new to text analysis. Any help is appreciated.


Solution

  • (create this new answer rather than update my existing one, as the question has updated quite a lot)

    Finally understand what wordnet.sysets() returns by installing package "pattern" and doing a debugging. Here is the code that runs:

    from pattern.en import wordnet
    
    list_of_words = [u"bird", u"smart", u"cool", u"happy"]
    list_of_words_synonyms = []
    list_of_words_hypnonyms = []
    
    for word in list_of_words:
        sts = wordnet.synsets(word)
        if len(sts):
            st = sts[0]
            list_of_words_synonyms.extend(st.synonyms)
            list_of_words_hypnonyms.extend(hs.senses[0] for hs in st.hyponyms())        
    
    list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
    print(list_of_words)
    

    Please notes:

    1. the duplication is not considered. If removing duplication is a requirement, then you may use sets.Set instead of list
    2. for each hypnonym, it has multiple senses. list_of_words_hypnonyms just includes the first one. If you want to include all of them, use below code to replace the corresponing line: list_of_words_hypnonyms.extend(sense for hs in st.hyponyms() for sense in hs.senses)
    3. for adding hypononyms to list_of_words_hypnonyms, generator expression is used

    The result is:

    [u'bird', u'smart', u'cool', u'happy', u'bird', u'smart', u'smarting', u'smartness', u'cool', u'dickeybird', u'cock', u'hen', u'nester', u'night bird', u'bird of passage', u'protoavis', u'archaeopteryx', u'Sinornis', u'Ibero-mesornis', u'archaeornis', u'ratite', u'carinate', u'passerine', u'nonpasserine bird', u'bird of prey', u'gallinaceous bird', u'parrot', u'cuculiform bird', u'coraciiform bird', u'apodiform bird', u'caprimulgiform bird', u'piciform bird', u'trogon', u'aquatic bird', u'twitterer']