Search code examples
pythonunicodenltkwordnet

AttributeError: 'unicode' object has no attribute 'wup_similarity'


I am playing with the nltk module in Python 2.7. Following is my code

from nltk.corpus import wordnet as wn

listsyn1 = []
listsyn2 = []

for synset in wn.synsets('dog', pos=wn.NOUN):
    print synset.name()
    for lemma in synset.lemmas():
        listsyn1.append(lemma.name())

for synset in wn.synsets('paw', pos=wn.NOUN):
    print synset.name()
    for lemma in synset.lemmas():
        listsyn2.append(lemma.name())

countsyn1 = len(listsyn1)
countsyn2 = len(listsyn2)

sumofsimilarity = 0;
for firstgroup in listsyn1:
    for secondgroup in listsyn2:
        print(firstgroup.wup_similarity(secondgroup))
        sumofsimilarity = sumofsimilarity + firstgroup.wup_similarity(secondgroup)

averageofsimilarity = sumofsimilarity/(countsyn1*countsyn2)

I get the error "AttributeError: 'unicode' object has no attribute 'wup_similarity'" when I try to run this code. Thank you for help.


Solution

  • The similarity measures can only be accessed by the Synset objects not Lemma nor lemma_names (i.e. str type).

    dog = wn.synsets('dog', 'n')[0]
    paw = wn.synsets('paw', 'n')[0]
    
    print(type(dog), type(paw), dog.wup_similarity(paw))
    

    [out]:

    <class 'nltk.corpus.reader.wordnet.Synset'> <class 'nltk.corpus.reader.wordnet.Synset'> 0.21052631578947367
    

    When you get the .lemmas() and access the .names() attribute from the Synset object, you are getting str:

    dog = wn.synsets('dog', 'n')[0]
    print(type(dog), dog)
    print(type(dog.lemmas()[0]), dog.lemmas()[0])
    print(type(dog.lemmas()[0].name()), dog.lemmas()[0].name())
    

    [out]:

    <class 'nltk.corpus.reader.wordnet.Synset'> Synset('dog.n.01')
    <class 'nltk.corpus.reader.wordnet.Lemma'> Lemma('dog.n.01.dog')
    <class 'str'> dog
    

    You can use the hasattr function to check what objects/types can access a certain function or attribute:

    dog = wn.synsets('dog', 'n')[0]
    print(hasattr(dog, 'wup_similarity'))
    print(hasattr(dog.lemmas()[0], 'wup_similarity'))
    print(hasattr(dog.lemmas()[0].name(), 'wup_similarity'))
    

    [out]:

    True
    False
    False
    

    Most probably, you want a similar function to https://github.com/alvations/pywsd/blob/master/pywsd/similarity.py#L76 which maximizes the wup_similarity across two synsets but note that there are many caveats like pre-lemmatization that is necessary.

    So I think that's where you want to avoid it by using the .lemma_names(). Perhaps, you can do this:

    def ss_lnames(word):
        return set(chain(*[ss.lemma_names() for ss in wn.synsets(word, 'n')]))
    
    dog_lnames = ss_lnames('dog')
    paw_lnames = ss_lnames('paw')
    
    for dog_name, paw_name in product(dog_lnames, paw_lnames):
        for dog_ss, paw_ss in product(wn.synsets(dog_name, 'n'), wn.synsets(paw_name, 'n')):
            print(dog_ss, paw_ss, dog_ss.wup_similarity(paw_ss))  
    

    But most probably the results are un-interpretable and unreliable since there's no word sense disambiguation going up prior to synset look up bot in the outer and inner loop.