Search code examples
pythonnlpnltkwordnet

Incomplete list of synset hypernyms in NLTK's WordNet?


While trying to recover any given WordNet synset's hypernyms through WN NLTK's interface, I am getting what I think are different results from WN's web search interface. For example:

from nltk.corpus import wordnet as wn
bank6ss = wn.synsets("bank")[5]  # 'bank' as gambling house funds
bank6ss.hypernyms()
# returns [Synset('funds.n.01')]

That is, only one hypernym found (no others are found with, for instance, instance_hypernyms()). However, when looking at WN's web interface, this sense of 'bank' lists several other hypernyms under "Direct hypernym":

funds, finances, monetary resource, cash in hand, pecuniary resource

What would explain this difference, and how could I get that longer list of hypernyms in NLTK's WordNet?

The WordNet version used in my NLTK installation is 3.0.


Solution

  • I just realized that I'm looking at two different types of output: What is returned in NLTK WordNet is a hypernym synset (Synset['funds.n.01']) while the list of hypernyms in the web interface is composed of lemmas belonging to that one synset.

    To fully answer the question, this list of lemmas can be recovered in NLTK as follows:

    from nltk.corpus import wordnet as wn
    bank6ss = wn.synsets("bank")[5]  # 'bank' as gambling house funds
    hn1ss = bank6ss.hypernyms()[0]
    hn1ss.lemmas()
    # returns [Lemma('funds.n.01.funds'), 
    #   Lemma('funds.n.01.finances'),
    #   Lemma('funds.n.01.monetary_resource'), 
    #   Lemma('funds.n.01.cash_in_hand'),
    #   Lemma('funds.n.01.pecuniary_resource')]
    

    Or, if only lemma names are of interest:

    hn1ss.lemma_names()
    # returns [u'funds',
    #   u'finances',
    #   u'monetary_resource',
    #   u'cash_in_hand',
    #   u'pecuniary_resource']