I'd like to color the sentence in terminal so that nouns will be blue and verbs will be green. Everything else will be black.
So far, i tried to use nltk
and colorama
modules for this purpose.
import nltk
from colorama import Fore
This code will find out the nouns and verbs, so that verbs are VB
or VBD
and nouns are NN
.
s = nltk.word_tokenize(sample_sentence)
tagged_text = nltk.pos_tag(s)
print tagged_text
[('Stately', 'RB'), (',', ','), ('plump', 'VB'), ('Buck', 'NNP'), ('Mulligan', 'NNP'), ('came', 'VBD'), ('from', 'IN'), ('the', 'DT'), ('stairhead', 'NN'), (',', ','), ('bearing', 'VBG'), ('a', 'DT'), ('bowl', 'NN'), ('of', 'IN'), ('lather', 'NN'), ('on', 'IN'), ('which', 'WDT'), ('a', 'DT'), ('mirror', 'NN'), ('and', 'CC'), ('a', 'DT'), ('razor', 'NN'), ('lay', 'NN'), ('crossed', 'VBD'), ('.', '.')]
When I want to print colored text I will use:
print Fore.BLUE + some_noun
print Fore.GREEN + some_verb
print Fore.BLACK + something_else
I have a problem to print the sentence. How would you loop through tagged_text
so that it will print the sample_sentence
unchanged (only the desired colors will be applied)?
How about this? It keeps the whitespace exactly as in the original text. I do believe verbs ought to be red though.
from colorama import Fore, init
import re
init()
tagged_text = [('Stately', 'RB'), (',', ','), ('plump', 'VB'), ('Buck', 'NNP'), ('Mulligan', 'NNP'), ('came', 'VBD'),
('from', 'IN'), ('the', 'DT'), ('stairhead', 'NN'), (',', ','), ('bearing', 'VBG'), ('a', 'DT'),
('bowl', 'NN'), ('of', 'IN'), ('lather', 'NN'), ('on', 'IN'), ('which', 'WDT'), ('a', 'DT'),
('mirror', 'NN'), ('and', 'CC'), ('a', 'DT'),('razor', 'NN'), ('lay', 'NN'), ('crossed', 'VBD'),
('.', '.'), ('The', 'DET'), ('function', 'NN'), ('f', 'SYM'), ('(','('),('x','SYM'),(',',','),
('y','SYM'),(')',')'),('takes','VB'), ('two', 'CD'), ('arguments', 'NN'), ('.','.')]
origtext = 'Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed. The function f(x,y) takes two arguments.'
colordict = {'VB': Fore.GREEN, 'VBD': Fore.GREEN, 'NN': Fore.BLUE}
colorwords = ''
for word, tag in tagged_text:
color = Fore.BLACK
word = re.match(r'\s*%s\s*' % re.escape(word), origtext).group()
origtext = origtext.split(word,1)[1]
if tag in colordict:
color = colordict[tag]
colorwords += color + word
print colorwords