Search code examples
pythonnlpnltkpos-tagger

NLTK identifies verb as Noun in Imperatives


I am using NLTK POS tagger as below

sent1='get me now'
sent2='run fast'
tags=pos_tag(word_tokenize(sent2))
print tags
[('run', 'NN'), ('fast', 'VBD')]

I find similar posts NLTK Thinks that Imperatives are Nouns which suggest to add the word to a dictionary as a verb. Problem is I have too many such unknown words. But one clue I have, they always appear at the start of a phrase.

Eg: 'Download now', 'Book it now', 'Sign up'

How can i correctly assist the NLTK to produce correct result


Solution

  • There are other third-party models that you can load in NLTK. Take a look at Python NLTK pos_tag not returning the correct part-of-speech tag


    To answer the question with some hacks, you can trick the POS tagger by adding a pronoun so that the verb gets a subject, e.g.

    >>> from nltk import pos_tag
    >>> sent1 = 'get me now'.split()
    >>> sent2 = 'run fast'.split()
    >>> pos_tag(['He'] + sent1)
    [('He', 'PRP'), ('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
    >>> pos_tag(['He'] + sent1)[1:]
    [('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
    

    To functionalize the answer:

    >>> from nltk import pos_tag
    >>> sent1 = 'get me now'.split()
    >>> sent2 = 'run fast'.split()
    >>> def imperative_pos_tag(sent):
    ...     return pos_tag(['He']+sent)[1:]
    ... 
    >>> imperative_pos_tag(sent1)
    [('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
    >>> imperative_pos_tag(sent2)
    [('run', 'VBP'), ('fast', 'RB')]
    

    If you want all verbs in your imperative to receive base form VB tag:

    >>> from nltk import pos_tag
    >>> sent1 = 'get me now'.split()
    >>> sent2 = 'run fast'.split()
    >>> def imperative_pos_tag(sent):
    ...     return [(word, tag[:2]) if tag.startswith('VB') else (word,tag) for word, tag in pos_tag(['He']+sent)[1:]]
    ... 
    >>> imperative_pos_tag(sent1)
    [('get', 'VB'), ('me', 'PRP'), ('now', 'RB')]
    >>> imperative_pos_tag(sent2)
    [('run', 'VB'), ('fast', 'RB')]