I'm writing a very basic commenting system and want to implement a simple, efficient bad words filter.
I'm aware of the problems associated with bad word filters and realize it's basically impossible to write one that keeps misspellings and innuendo out, but I'm just wanting to write a very simple one that keeps correct spellings of vulgar words from being displayed.
I found a bad words list of about 400 words and put it into preg_replace()
with the pattern being:
/(these|are|bad|words|like|ass)/
The problem is that it replaces any occurrence of the characters in the pattern, even if they are in the middle of a word. So, for example, assist
will be replaced with ist
.
Second question: instead of replacing the bad words with an empty string, or with a fixed-width string such as ****
, is there a way to replace it with a string of asterisks with the same length of the replaced word?
preg_replace_callback(
'/\b(these|are|bad|words|like|ass)\b/',
function (array $match) { return str_repeat('*', strlen($match[1])); },
$comment
)
\b
is a word boundary and will probably suffice for most cases; though it probably won't be perfect for all cases.