Search code examples
pythonnltkwordnet

How to calculate the deepest node in WordNet using NLTK?


Is there built-in functionality to find the lowest word in a word hierarchy using NLTK? For example, if there were no edge between 'placenta' and 'carnivore' in the first graph at http://www.randomhacks.net/2009/12/29/visualizing-wordnet-relationships-as-graphs/, the lowest words would be 'placenta' and 'carnivore' (both having distance 10 from 'entity').


Solution

  • You can find the synset with no hyponyms, e.g.

    from nltk.corpus import wordnet as wn
    
    lowest_level = set()
    
    for ss in wn.all_synsets():
        if ss.hyponyms() == []:
            lowest_level.add(ss)
    
    len(lowest_level) # 97651
    

    If you would like to exclude synsets with instance hyponyms:

    from nltk.corpus import wordnet as wn
    
    lowest_level = set()
    
    for ss in wn.all_synsets():
        if ss.hyponyms() == ss.instance_hyponyms() == []:
            lowest_level.add(ss)
    
    len(lowest_level) # 97187