Search code examples
pythonsemanticsnltkwordnet

How to determine semantic hierarchies / relations in using NLTK?


I want to use NLTK and wordnet to understand the semantic relation between two words. Like if I enter "employee" and "waiter", it returns something showing that employee is more general than waiter. Or for "employee" and "worker", it returns equal. Does anyone know how to do that?


Solution

  • Firstly, you have to tackle the problem of getting words into lemmas and then into Synsets, i.e. how can you identify a synset from a word?

    word => lemma => lemma.pos.sense => synset    
    Waiters => waiter => 'waiter.n.01' => wn.Synset('waiter.n.01')
    

    So let's say you have already deal with the above problem and arrived at the right most representation of waiter, then you can continue to compare synsets. Do note that, a word can have many synsets

    from nltk.corpus import wordnet as wn
    waiter = wn.Synset('waiter.n.01')
    employee = wn.Synset('employee.n.01')
    
    all_hyponyms_of_waiter = list(set([w.replace("_"," ") for s in waiter.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))
    all_hyponyms_of_employee = list(set([w.replace("_"," ") for s in employee.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))
    
    if 'waiter' in all_hyponyms_of_employee:
      print 'employee more general than waiter'
    elif 'employee' in all_hyponyms_of_waiter:
      print 'waiter more general than employee'
    else:
      print "The SUMO ontology used in wordnet just doesn't have employee or waiter under the same tree"