Search code examples
pythonnlpnltkwordnetwsd

How to get the wordnet sense frequency of a synset in NLTK?


According to the documentation i can load a sense tagged corpus in nltk as such:

>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')

I can also get the definition, pos, offset, examples as such:

>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition

But how can get the frequency of a synset from a corpus? To break down the question:

  1. first how to count many times did a synset occurs a sense-tagged corpus?
  2. then the next step is to divide by the the count by the total number of counts for all synsets occurrences given the particular lemma.

Solution

  • I managed to do it this way.

    from nltk.corpus import wordnet as wn
    
    word = "dog"
    synsets = wn.synsets(word)
    
    sense2freq = {}
    for s in synsets:
      freq = 0  
      for lemma in s.lemmas:
        freq+=lemma.count()
      sense2freq[s.offset+"-"+s.pos] = freq
    
    for s in sense2freq:
      print s, sense2freq[s]