Search code examples
pythonnlpnltkwordnetopen-multilingual-wordnet

How to get Sense Key in WordNet for NLTK Python?


Hi Stackoverflow Community

I just started tinkering around with Python NLTK and have directed my attention to the Wordnet module.

I am attempting to get the Sense Ky for a given lemma and found the following:

s = wn.synset('skill.n.01')
s.lemmas # >>> [Lemma('skill.n.01.skill'), ... ]
s.lemmas[0].key # >>> 'skill%1:09:01::'

However, this implementation doesn't seem to be supported anymore.

Traceback (most recent call last):
File "C:/Users/Admin/PycharmProjects/momely/placementarchitect/testbench.py", line 59, in <module>
s.lemmas[0].key
TypeError: 'method' object is not subscriptable

I am wondering whether anyone would be able to point me in the right direction as to how I might be able to get the sense key given a lemma or synset?

Any advice would be highly appreciated!


Solution

  • Take a look at https://stackoverflow.com/a/27518899/610569 for the difference between, Synset.lemmas()[0].key and Synset.lemmas()[0].key():

    >>> from nltk.corpus import wordnet as wn
    >>> wn.synset('dog.n.1')
    Synset('dog.n.01')
    >>> wn.synset('dog.n.1').lemmas()
    [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
    >>> wn.synset('dog.n.1').lemmas()[0]
    Lemma('dog.n.01.dog')
    >>> wn.synset('dog.n.1').lemmas()[0].name()
    u'dog'
    
    # To retrieve Princeton WordNet style keys.
    >>> wn.synset('dog.n.1').lemmas()[0].key()
    u'dog%1:05:00::'
    

    For Open Multilingual WordNet, using the offset + pos keys would be easier, e.g.:

    >>> from nltk.corpus import wordnet as wn
    >>> ss = wn.synset('dog.n.1')
    >>> ss.offset()
    2084071
    >>> ss.pos()
    u'n'
    >>> '{}-{}'.format(str(ss.offset()).zfill(8), ss.pos())
    '02084071-n'
    

    Searching the offset + pos key (e.g. 02084071-n) on the OMW interface: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid will get you to a nice visualization page of the synset.