Search code examples
pythonnltkwordnet

Python: Passing variables into Wordnet Synsets methods in NLTK


I need to work on a project that require NLTK so I started learning Python two weeks ago but struggling to understand Python and NLTK.

From the NLTK documentation, I can understand the following codes and they work well if I manually add the word apple and pear into the codes below.

from nltk.corpus import wordnet as wn

apple = wn.synset('apple.n.01')
pear = wn.synset('pear.n.01')

print apple.lch_similarity(pear)

Output: 2.53897387106

However, I need to use the NLTK to work with a list of items. For example, I have a list of items below and I would like to compare the items from list1 with list2 - for example: compare word1 from list1 with every word in list 2, then word2 from list1 with every word from list2 until all words in list1 is compared.

list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]

wordFromList1 = list1[0]
wordFromList2 = list2[0]

wordFromList1 = wn.synset(wordFromList1)
wordFromList2 = wn.synset(wordFromList2)    

print wordFromList1.lch_similarity(wordFromList2)

The codes above will of course gives an error. Can anyone show me how I can pass a variable into synset method [wn.synset(*pass_variable_in_here*)] so that I can use a double loop to get the lch_similarity values for them. Thank you.


Solution

  • wordnet.synset expects a 3-part name string of the form: word.pos.nn.

    You did not specify the pos.nn part for each word in list1 and list2.

    It seems reasonable to assume that all the words are nouns, so we could try appending the string '.n.01' to each string in list1 and list2:

    for word1, word2 in IT.product(list1, list2):
        wordFromList1 = wordnet.synset(word1+'.n.01')
        wordFromList2 = wordnet.synset(word2+'.n.02')
    

    That does not work, however. wordnet.synset('drinks.n.01') raises a WordNetError.

    On the other hand, the same doc page shows you can lookup similar words using the synsets method:

    For example, wordnet.synsets('drinks') returns the list:

    [Synset('drink.n.01'),
     Synset('drink.n.02'),
     Synset('beverage.n.01'),
     Synset('drink.n.04'),
     Synset('swallow.n.02'),
     Synset('drink.v.01'),
     Synset('drink.v.02'),
     Synset('toast.v.02'),
     Synset('drink_in.v.01'),
     Synset('drink.v.05')]
    

    So at this point, you need to give some thought to what you want the program to do. If you are okay with just picking the first item in this list as a proxy for drinks, then you could use

    for word1, word2 in IT.product(list1, list2):
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
    

    which would result in a program that looks like this:

    import nltk.corpus as corpus
    import itertools as IT
    
    wordnet = corpus.wordnet
    list1 = ["apple", "honey", "drinks", "flowers", "paper"]
    list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]
    
    for word1, word2 in IT.product(list1, list2):
        # print(word1, word2)
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
        print('{w1}, {w2}: {s}'.format(
            w1 = wordFromList1.name,
            w2 = wordFromList2.name,
            s = wordFromList1.lch_similarity(wordFromList2)))
    

    which yields

    apple.n.01, pear.n.01: 2.53897387106
    apple.n.01, shell.n.01: 1.07263680226
    apple.n.01, movie.n.01: 1.15267950994
    apple.n.01, fire.n.01: 1.07263680226
    ...