Search code examples
pythonnlpnltkwordnetpart-of-speech

Python: map NLTK Stanford POS tags to WordNet POS tags


I'm reading a list of sentences and tagging each word with NLTK's Stanford POS tagger. I get outputs like so:

wordnet_sense = []

for o in output:
    a = st.tag(o)
    wordnet_sense.append(a)

outputs: [[(u'feel', u'VB'), (u'great', u'JJ')], [(u'good', u'JJ')]]

I want to map these words with their POS, so that they are recognised in WordNet.

I've attempted this:

sense = []

for i in wordnet_sense:
    tmp = []

    for tok, pos in i:
        lower_pos = pos[0].lower()

        if lower_pos in ['a', 'n', 'v', 'r', 's']:
            res = wn.synsets(tok, lower_pos)
            if len(res) > 0:
                a = res[0]
        else:
            a = "[{0}, {1}]".format(tok, pos)

        tmp.append(a)

    sense.append(tmp)

print sense

outputs: [Synset('feel.v.01'), '[great, JJ]'], ['[good, JJ]']]

So feel is recognised as a verb, but great and good are not recognised as adjectives. I've also checked if great and good actually belong in Wordnet because I thought they weren't being mapped if they weren't there, but they are. Can anyone help?


Solution

  • Here's a cute function from pywsd:

    from nltk.corpus import wordnet as wn
    
    def penn2morphy(penntag, returnNone=False):
        morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
                      'VB':wn.VERB, 'RB':wn.ADV}
        try:
            return morphy_tag[penntag[:2]]
        except:
            return None if returnNone else ''