I am trying to add the html <b>
element to a list of words in a sentence. After doing some search I got it almost working, except the ignore-case.
import re
bolds = ['test', 'tested'] # I want to bold these words, ignoring-case
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."
pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text)
print(text_bolded)
Output:
Test lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur TEST adipiscing elit <b>test</b>.
This output misses the <b>
element for Test
and TEST
. In other words, I would like the output to be:
<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.
One hack is that I explicitly add the capitalize
and upper
, like so ...
bolds = bolds + [b.capitalize() for b in bolds] + [b.upper() for b in bolds]
But I am thinking there must be a better way to do this. Besides, the above hack will miss words like tesT
, etc.
Thank you!
There's no need for the dictionary or function. All the replacements are simple string wrapped around the original string, you can get that with a back-reference.
Use flags=re.I
to make the match case-insensitive.
text_bolded = re.sub(pattern, r'<b>\g<0></b>', text, flags=re.I)
\g<0>
is a back-reference that returns the full match of the pattern.