I am putting together a basic profanity filter in java to detect profanity on user input. I am not trying to handle all possible scenarios which I know that is probably impossible to solve using a computer only. However, I do want to handle few basic scenarios which a computer should be suitable to handle. In this particular case I am trying to detect a user trying to break the filter by using spaces between letters. for example: "hello, I am using a s m u r f word here". (smurf being the "bad" word here).
In my current implementation I keep list of words which I check the input text against:
public boolean containsBadWords (String text) {
for (String word : badWords) {
if (text.matches (".*\\b" + word +"\\b.*")) {
return (true);
return (false);
But this would not handle the spaced letters issue I described above.
Anybody knows how to collapse these words using Java so I can process them using a basic text matching algorithm?
Prepare a list of forbidden words, go over the words, convert words into regex, eg "smurf" -> " s *m *u *r *f * "
String regex = " " + word.replaceAll("(.)", "$1 *") + " ";
and try to find it in the text
boolean found = Pattern.compile(regex).matcher(text).find();