I'm reading a list of sentences and tagging each word with NLTK's Stanford POS tagger. I get outputs like so:
wordnet_sense = []
for o in output:
a = st.tag(o)
wordnet_sense.append(a)
outputs: [[(u'feel', u'VB'), (u'great', u'JJ')], [(u'good', u'JJ')]]
I want to map these words with their POS, so that they are recognised in WordNet.
I've attempted this:
sense = []
for i in wordnet_sense:
tmp = []
for tok, pos in i:
lower_pos = pos[0].lower()
if lower_pos in ['a', 'n', 'v', 'r', 's']:
res = wn.synsets(tok, lower_pos)
if len(res) > 0:
a = res[0]
else:
a = "[{0}, {1}]".format(tok, pos)
tmp.append(a)
sense.append(tmp)
print sense
outputs: [Synset('feel.v.01'), '[great, JJ]'], ['[good, JJ]']]
So feel
is recognised as a verb, but great
and good
are not recognised as adjectives. I've also checked if great
and good
actually belong in Wordnet because I thought they weren't being mapped if they weren't there, but they are. Can anyone help?
Here's a cute function from pywsd
:
from nltk.corpus import wordnet as wn
def penn2morphy(penntag, returnNone=False):
morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
'VB':wn.VERB, 'RB':wn.ADV}
try:
return morphy_tag[penntag[:2]]
except:
return None if returnNone else ''