Search code examples
machine-learningnlphidden-markov-models

Models for classify Noun Phrase?


I need a model for the following tasks:

a sequence of words, with its POS tags. I want to judge whether this sequence of words is a Noun Phrase or not.

One model I can think of is HMM.

For those sequences which are noun phrase, we train a HMM (HMM+). For those are not noun phrase, we try an HMM(HMM-). And when we do prediction for a sequence, we can calculate P(sequence| HMM+) and P(sequence|HMM-). If the former is larger, we think this phrase is a noun phrase, otherwise it's not.

What do you think of it? and do you have any other models suited for this question?


Solution

  • My hunch is that an HMM is not the right model. It can be used to guess POS tags, by deriving the sequence of tags with the highest probabilities based on prior probabilities and conditional probabilities from one token to the next.

    For a complete noun phrase I don't see how this model matches.

    Any probability based approach will be very difficult to train, because noun phrases can contain many tokens. This makes for really many combinations. To get useful training probabilities, you need really huge training sets.

    You might quickly and easily get a sufficiently good start by crafting a set of grammar rules, for example regular expressions, over POS tags by following the description in

    http://en.wikipedia.org/wiki/Noun_phrase#Components_of_noun_phrases

    or any other linguistic description of noun phrases.