Search code examples
pythontagsnltkwordnetsynonym

How to print all lemma_names of word without repeating its synonyms and pos_tag more than once in NLTK synsets?


I'm trying to find a synsets of words. Here's my codes:

from nltk.corpus import wordnet as wn
from nltk import pos_tag

def getSynonyms(word1):
    synonymList1 = []
    for data1 in word1:
        wordnetSynset1 = wn.synsets(data1)
        tempList1=[]
        for synset1 in wordnetSynset1:
            synLemmas = synset1.lemma_names()
            for i in xrange(len(synLemmas)):
                word = synLemmas[i].replace('_',' ')
                tempList1.append(pos_tag(word.split()))
        synonymList1.append(tempList1)
    return synonymList1

word1 = ['study']

syn1 = getSynonyms(word1)

print syn1

and here's the output :

[[[(u'survey', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'work', 'NN')], [(u'report', 'NN')], [(u'study', 'NN')], [(u'written', 'VBN'), (u'report', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'discipline', 'NN')], [(u'subject', 'NN')], [(u'subject', 'JJ'), (u'area', 'NN')], [(u'subject', 'JJ'), (u'field', 'NN')], [(u'field', 'NN')], [(u'field', 'NN'), (u'of', 'IN'), (u'study', 'NN')], [(u'study', 'NN')], [(u'bailiwick', 'NN')], [(u'sketch', 'NN')], [(u'study', 'NN')], [(u'cogitation', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'analyze', 'NN')], [(u'analyse', 'NN')], [(u'study', 'NN')], [(u'examine', 'NN')], [(u'canvass', 'NN')], [(u'canvas', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'consider', 'VB')], [(u'learn', 'NN')], [(u'study', 'NN')], [(u'read', 'NN')], [(u'take', 'VB')], [(u'study', 'NN')], [(u'hit', 'VB'), (u'the', 'DT'), (u'books', 'NNS')], [(u'study', 'NN')], [(u'meditate', 'NN')], [(u'contemplate', 'NN')]]]

as we can see, 'study','NN' appears more than once

how to print only once for each synonyms without repitition?

so each synonyms represented with only one synonym


Solution

  • Instead of always appending to the list you have inside the for loop, in the line tempList1.append(pos_tag(word.split())). You should check if the element you are trying to add is there in the list already. Having a simple if statement check should do it.

    if pos_tag(word.split()) not in tempList1:
       tempList1.append(pos_tag(word.split()))
    

    This was an element will not be added twice.