Search code examples
pythonnlpnltkwordnetpart-of-speech

WordNet - What does n and the number represent?


My question is related to WordNet Interface.

   >>> wn.synsets('cat')
       [Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'),
        Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), 
        Synset('caterpillar.n.02'), Synset('big_cat.n.01'), 
        Synset('computerized_tomography.n.01'), Synset('cat.v.01'), 
        Synset('vomit.v.01')]
    >>> 

I could not find the answer to what is the purpose of n and the following number in cat.n.01 or caterpillar.n.02.


Solution

  • Per the NLTK docs, a <lemma>.<pos>.<number> Synset string is composed of the following parts:

    • <lemma> is the word’s morphological stem
    • <pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB
    • <number> is the sense number, counting from 0

    Thus, the <pos> is the part of speech. According to the wordnet man page, the part of speech character has the following meaning:

    n    NOUN
    v    VERB
    a    ADJECTIVE
    s    ADJECTIVE SATELLITE
    r    ADVERB 
    

    The <number> is used to disambiguate word meanings.