I need to work on a project that require NLTK so I started learning Python two weeks ago but struggling to understand Python and NLTK.
From the NLTK documentation, I can understand the following codes and they work well if I manually add the word apple and pear into the codes below.
from nltk.corpus import wordnet as wn
apple = wn.synset('apple.n.01')
pear = wn.synset('pear.n.01')
print apple.lch_similarity(pear)
Output: 2.53897387106
However, I need to use the NLTK to work with a list of items. For example, I have a list of items below and I would like to compare the items from list1 with list2 - for example: compare word1 from list1 with every word in list 2, then word2 from list1 with every word from list2 until all words in list1 is compared.
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]
wordFromList1 = list1[0]
wordFromList2 = list2[0]
wordFromList1 = wn.synset(wordFromList1)
wordFromList2 = wn.synset(wordFromList2)
print wordFromList1.lch_similarity(wordFromList2)
The codes above will of course gives an error. Can anyone show me how I can pass a variable into synset method [wn.synset(*pass_variable_in_here*)] so that I can use a double loop to get the lch_similarity values for them. Thank you.
wordnet.synset
expects a 3-part
name string of the form:
word.pos.nn
.
You did not specify the pos.nn
part for each word in list1
and
list2
.
It seems reasonable to assume that all the words are nouns, so we could try
appending the string '.n.01'
to each string in list1
and list2
:
for word1, word2 in IT.product(list1, list2):
wordFromList1 = wordnet.synset(word1+'.n.01')
wordFromList2 = wordnet.synset(word2+'.n.02')
That does not work, however. wordnet.synset('drinks.n.01')
raises a WordNetError
.
On the other hand, the same doc
page shows you can
lookup similar words using the synsets
method:
For example, wordnet.synsets('drinks')
returns the list:
[Synset('drink.n.01'),
Synset('drink.n.02'),
Synset('beverage.n.01'),
Synset('drink.n.04'),
Synset('swallow.n.02'),
Synset('drink.v.01'),
Synset('drink.v.02'),
Synset('toast.v.02'),
Synset('drink_in.v.01'),
Synset('drink.v.05')]
So at this point, you need to give some thought to what you want the program to do. If you are okay with just picking the first item in this list as a proxy for drinks
,
then you could use
for word1, word2 in IT.product(list1, list2):
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
which would result in a program that looks like this:
import nltk.corpus as corpus
import itertools as IT
wordnet = corpus.wordnet
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]
for word1, word2 in IT.product(list1, list2):
# print(word1, word2)
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
print('{w1}, {w2}: {s}'.format(
w1 = wordFromList1.name,
w2 = wordFromList2.name,
s = wordFromList1.lch_similarity(wordFromList2)))
which yields
apple.n.01, pear.n.01: 2.53897387106
apple.n.01, shell.n.01: 1.07263680226
apple.n.01, movie.n.01: 1.15267950994
apple.n.01, fire.n.01: 1.07263680226
...