Search code examples
pythonnltkwordnet

How can I get the same WordNet output from the terminal in Python/NLTK?


I have WordNet installed on my machine, and when I run the terminal command

wn funny -synsa

I get the following output:

enter image description here

Now I would like to get the same information within Python using the NLTK package. For example, if I run

synset_name = 'amusing.s.02'

for l in wordnet.synset(synset_name).lemmas():
    print('Lemma: {}'.format(l.name()))

I get all the lemmas I see in the terminal output (i.e.: amusing, comic, comical, funny, laughable, mirthful, risible). However, what does the "=> humorous (vs. humorless), humourous" part in the terminal output mean and how can I get this with NLTK? It looks kind of like a hypernym, but adjectives don't have hypernym relationships.


Solution

  • From https://wordnet.princeton.edu/documentation/wn1wn

    -syns (n | v | a | r ) Display synonyms and immediate hypernyms of synsets containing searchstr . Synsets are ordered by estimated frequency of use. For adjectives, if searchstr is in a head synset, the cluster's satellite synsets are displayed in place of hypernyms. If searchstr is in a satellite synset, its head synset is also displayed.

    To emulate the behavior in NLTK, you'll need to:

    • filter the synset by the POS
    • loop through the synsets
    • print the .lemma_names() per synset
    • if there is an immediate hypernyms, print it
      • else,
        • print the satellite synsets in place of hypernyms
        • if synset is a satellite synset,
          • also print the head synset

    In code:

    import nltk
    from nltk.corpus import wordnet as wn
    
    nltk.download('wordnet')
    
    word = 'funny'
    
    for ss in wn.synsets('funny', 'a'):
      print(', '.join(ss.lemma_names()))
      # if there are immediate hypernyms
      # print the hypernyms
      if ss.hypernyms(): 
        print(ss.hypernyms()[0])
      # if the synset is a satellite sense
      # print the head synsets, i.e. with 'a' POS
      elif str(ss.pos()) == 's': 
        head_ss = ss.similar_tos()[0]
        head_ss_lemma_names = ss.similar_tos()[0].lemma_names()
        head_ss_first_lemma = head_ss_lemma_names[0]
        head_ss_other_lemmas = ""
        if len(head_ss_lemma_names) > 1:
          head_ss_other_lemmas = ", " + ", ".join(ss.similar_tos()[0].lemma_names()[1:])
        head_ss_anton = ""
        if hasattr(head_ss, "_antonyms"):
          first_anto_lemma = head_ss.antonyms()[0].lemma_names()[0]
          head_ss_anton = f" (vs {first_anto_lemma})"
        print(f"   ==> {head_ss_first_lemma}{head_ss_anton}{head_ss_other_lemmas}")
      print()
    

    [out]:

    amusing, comic, comical, funny, laughable, mirthful, risible
       ==> humorous, humourous
    
    curious, funny, odd, peculiar, queer, rum, rummy, singular
       ==> strange, unusual
    
    fishy, funny, shady, suspect, suspicious
       ==> questionable
    
    funny
       ==> ill, sick
    
    

    Note: Somehow the NLTK interface didn't get the antonyms() part of the head synset of the satellite so the (vs ...) lemmas are missing. (Looks like a bug, might be good to raise an issue in nltk and wn pypi library maintainers.