Search code examples
pythonnltkcolorama

Coloring text in terminal according to part of speech


I'd like to color the sentence in terminal so that nouns will be blue and verbs will be green. Everything else will be black.

So far, i tried to use nltk and colorama modules for this purpose.

import nltk
from colorama import Fore

This code will find out the nouns and verbs, so that verbs are VB or VBD and nouns are NN.

s = nltk.word_tokenize(sample_sentence)
tagged_text = nltk.pos_tag(s)
print tagged_text

[('Stately', 'RB'), (',', ','), ('plump', 'VB'), ('Buck', 'NNP'), ('Mulligan', 'NNP'), ('came', 'VBD'), ('from', 'IN'), ('the', 'DT'), ('stairhead', 'NN'), (',', ','), ('bearing', 'VBG'), ('a', 'DT'), ('bowl', 'NN'), ('of', 'IN'), ('lather', 'NN'), ('on', 'IN'), ('which', 'WDT'), ('a', 'DT'), ('mirror', 'NN'), ('and', 'CC'), ('a', 'DT'), ('razor', 'NN'), ('lay', 'NN'), ('crossed', 'VBD'), ('.', '.')]

When I want to print colored text I will use:

print Fore.BLUE + some_noun
print Fore.GREEN + some_verb
print Fore.BLACK + something_else

I have a problem to print the sentence. How would you loop through tagged_text so that it will print the sample_sentence unchanged (only the desired colors will be applied)?


Solution

  • How about this? It keeps the whitespace exactly as in the original text. I do believe verbs ought to be red though.

    from colorama import Fore, init
    import re
    init()
    
    tagged_text = [('Stately', 'RB'), (',', ','), ('plump', 'VB'), ('Buck', 'NNP'), ('Mulligan', 'NNP'), ('came', 'VBD'),
                    ('from', 'IN'), ('the', 'DT'), ('stairhead', 'NN'), (',', ','), ('bearing', 'VBG'), ('a', 'DT'), 
                    ('bowl', 'NN'), ('of', 'IN'), ('lather', 'NN'), ('on', 'IN'), ('which', 'WDT'), ('a', 'DT'),
                    ('mirror', 'NN'), ('and', 'CC'), ('a', 'DT'),('razor', 'NN'), ('lay', 'NN'), ('crossed', 'VBD'),
                    ('.', '.'), ('The', 'DET'), ('function', 'NN'), ('f', 'SYM'), ('(','('),('x','SYM'),(',',','),
                    ('y','SYM'),(')',')'),('takes','VB'), ('two', 'CD'), ('arguments', 'NN'), ('.','.')]
    origtext = 'Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed. The function f(x,y) takes two arguments.'
    
    colordict = {'VB': Fore.GREEN, 'VBD': Fore.GREEN, 'NN': Fore.BLUE}
    
    colorwords = ''
    for word, tag in tagged_text:
        color = Fore.BLACK
        word = re.match(r'\s*%s\s*' % re.escape(word), origtext).group()
        origtext = origtext.split(word,1)[1]
        if tag in colordict:
            color = colordict[tag]
        colorwords += color + word
    
    print colorwords