I'm attempting to censor certain words from being chatted in a game. The only issue is that the player can void my censor by adding onto the words. Here's an example.
//Check for rude words before sending to server
List<String> tokens = new ArrayList<String>();
tokens.add("bilbo");
tokens.add("baggins");
tokens.add("in");
tokens.add("the");
tokens.add("shire");
String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
Matcher findRudeWords = pattern.matcher(result.toLowerCase());
while (findRudeWords.find()) {
//Replace the bad word with astericks
String asterisk = StringUtils.leftPad("", findRudeWords.group(1).length(), '*');
result = result.replaceAll("(?i)" + findRudeWords.group(1), asterisk);
}
The standing issue is that if someone said bilbobaggins, without a space in between, my censor can be easily avoided. How is it that I can make a sufficient censor that doesn't just check words?
Take out the two word boundaries. The two \b's. I didn't want to bother with the extra library needed for StringUtils, so I modified your code a little, but here's what I tested with:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
class Ideone {
public static void main(String[] args) throws java.lang.Exception {
// Check for rude words before sending to server
String result = "heres bilbobaggins haha";
String patternString = "(bilbo|baggins|in|the|shire)";
Pattern pattern = Pattern.compile(patternString);
Matcher findRudeWords = pattern.matcher(result.toLowerCase());
while (findRudeWords.find()) {
// Replace the bad word with asterisks
result = result.replaceAll("(?i)" + findRudeWords.group(1), "*");
}
System.out.println("result=" + result);
}
}
Output:
result=heres ** haha
And you can play with that here: http://ideone.com/72SU7X