I use Python regular expression to do the profanity check. I have a blocked list of words, but there are some corner cases where I want to add exceptions for the bad words.
For example, I have ['foo', 'bar'] in the blocked list. But I want to exempt cases when it is:
This is my current approach in Python:
profanity_list = ['foo', 'bar']
pattern_profanity = re.compile(r'\b({})\b'.format('|'.join(profanity_list)),
flags=re.IGNORECASE) # same as r'\b(foo|bar)\b'
s = 'foo BAR foo good Bar'
censor_char = '*'
pattern_profanity.sub(repl=lambda m: censor_char*len(m.group(0)), string=s)
This gave me "*** *** *** good ***", but I want the result to be "*** BAR foo good ***". What I should do to include the exceptional cases? Is this feasible in regular expression? Thanks.
BTW, the solution I found is from this post.
You need
import re
profanity_list = ['foo', 'bar']
whitelist = ["BAR", "foo good"]
pattern_profanity = re.compile(
r'\b(?!(?:{})\b)(?i:{})\b'.format('|'.join(whitelist),'|'.join(profanity_list)))
s = 'foo BAR foo good Bar'
censor_char = '*'
print( re.sub(pattern_profanity, lambda m: censor_char*len(m.group(0)), s) )
# => *** BAR foo good ***
See the Python demo
The pattern is \b(?!(?:BAR|foo good)\b)(?i:foo|bar)\b
. See the regex demo. It matches:
\b
- a word boundary(?!(?:BAR|foo good)\b)
- not immediately followed with BAR
, foo good
(?i:foo|bar)
- a case insensitive modifier group: foo
or bar
matched as...\b
- whole word.