I implemented this "bad word" check function in php:
# bad word detector
function check_badwords($string) {
$badwords = array(a number of words some may find inappropriate for SE);
foreach($badwords as $item) {
if(stripos($string, $item) !== false) return true;
}
return false;
}
It works alright, except I'm having a little problem. If the $string is:
Who is the best guitarist ever?
...it returns true, because there is a match with Who ($string) and ho (in $badwords array). How could the function be modified so that it only checks for complete words, and not just part of words?
Thanks!
In order to check for complete words you should use regular expressions:
function check_badwords($string)
{
$badwords = array(/* the big list of words here */);
// Create the regex
$re = '/\b('.implode('|', $badwords).')\b/';
// Check if it matches the sentence
return preg_match($re, $string);
}
How the regex
works
The regular expression starts and ends with the special sequence \b
that matches a word boundary (i.e. when a word character is followed by a non-word character or viceversa; the word characters are the letters, the digits and the underscore).
Between the two word boundaries there is a subpattern that contains all the bad words separated by |
. The subpattern matches any of the bad words.
If you want to know what bad word was found you can change the function:
function check_badwords($string)
{
$badwords = array(/* the big list of words here */);
$re = '/\b('.implode('|', $badwords).')\b/';
// Check for matches, save the first match in $match
$result = preg_match($re, $string, $match);
// if $result is TRUE then $match[1] contains the first bad word found in $string
return $result;
}