Search code examples
pythonnltkpos-taggerperceptron

python: modify PerceptronTagger in nltk to recognize 'and/or'


How can I modify PerceptronTagger in nltk module (or maybe add some temporary functionality to it) so that it recognizes 'and/or' as 'CC' tag?


Solution

  • If this is the only thing you want to change, the simplest solution is to just post-process the tagged text:

    for sentence in tagged_sentences:
        for n, (word,tag) in enumerate(sentence):
            if word == 'and/or':
                sentence[n] = (word, "CC")
    

    But if your question is the first step to "improving" the NLTK's tagger, you should take the long view and think about how you could build or install a better tagger. Take a look at the many links included in this answer.