I have a set of words:
{corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise}
I want to compute the sematic similarity between every word in the previous set. I have one problem:
Sample code: Python: Passing variables into Wordnet Synsets methods in NLTK
import nltk.corpus as corpus
import itertools as IT
import fileinput
if __name__=="__main__":
wordnet = corpus.wordnet
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree"]
for word1, word2 in IT.product(list1, list2):
#print(word1, word2)
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
print('{w1}, {w2}: {s}'.format(
w1 = wordFromList1.name,
w2 = wordFromList2.name,
s = wordFromList1.wup_similarity(wordFromList2)))
Suppose that I add "vhicule" to the any of the lists. I get the following error:
IndexError: List index out of range.
How can I use this error to ignore the words that doesn't exist in the database?
You can check whether nltk.corpus.wordnet.synsets(i)
returns a list of synsets:
>>> from nltk.corpus import wordnet as wn
>>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
>>> x
['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
>>> y = [i for i in x if len(wn.synsets(i)) > 0]
>>> y
['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
And an even less verbose way is to check whether wn.synsets(i)
are None
:
>>> from nltk.corpus import wordnet as wn
>>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
>>> x
['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
>>> [i for i in x if wn.synsets(i)]
['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']