Search code examples
pythonlistnltkwordssynonym

How to check if words are in synsets or not?


I'm trying to compare two lists of words to check if :

  1. word1 list consists of words that also in synsets of word2 list

  2. word2 list consists of words that also in synsets of word1 list

if the words are inside the synsets, they return True

Here's my codes :

from nltk.corpus import wordnet as wn

word1 =  ['study', 'car']
word2 =  ['learn', 'motor']

def getSynonyms(word1):
    synonymList1 = []
    for data1 in word1:
        wordnetSynset1 = wn.synsets(data1)
        tempList1=[]
        for synset1 in wordnetSynset1:
            synLemmas = synset1.lemma_names()
            for i in xrange(len(synLemmas)):
                word = synLemmas[i].replace('_',' ')
                if word not in tempList1:
                    tempList1.append(word)
        synonymList1.append(tempList1)
    return synonymList1


def checkSynonyms(word1, word2):
    for i in xrange(len(word1)):
        for j in xrange(len(word2)):
            d1 = getSynonyms(word1)
            d2 = getSynonyms(word2)
            if word1[i] in d2:
                return True
            elif word2[j] in d1:
                return True
            else:
                return False

print word1
print
print word2
print
print getSynonyms(word1)
print
print getSynonyms(word2)
print 
print checkSynonyms(word1, word2)
print

but here's the output :

['study', 'car']

['learn', 'motor']

[[u'survey', u'study', u'work', u'report', u'written report', u'discipline', 
u'subject', u'subject area', u'subject field', u'field', u'field of study', 
u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', 
u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the 
books', u'meditate', u'contemplate'], [u'car', u'auto', u'automobile', 
u'machine', u'motorcar', u'railcar', u'railway car', u'railroad car', 
u'gondola', u'elevator car', u'cable car']]

[[u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick 
up', u'find out', u'get a line', u'discover', u'see', u'memorize', 
u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', 
u'determine', u'check', u'ascertain', u'watch'], [u'motor', u'drive', 
u'centrifugal', u'motive']]

False

as we can see, word 'study' in word1 also in the synsets of word2 >> u'study'

Why it returns false?


Solution

  • Since you want to compare the string values of word1 with d2, do not use if word1[i] in d2: because it will compare the string values of word1 with the array values of d2, for instance it will compare:

    'study' == [u'survey', u'study', u'work', u'report', u'written report', u'discipline', 
    u'subject', u'subject area', u'subject field', u'field', u'field of study', 
    u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', 
    u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the 
    books', u'meditate', u'contemplate']
    

    It will return False absolutely.

    So, instead of using if word1[i] in d2:, you should use if word1[i] in d2[k]: where k is an iterator.

    Hope it will help you.