Java 8 here. I am given a list of blacklisted words/expressions as well as an input string. I need to determine if any of those blacklisted items appears in the input string:
List<String> blacklist = new ArrayList<>();
// populate the blacklist and "normalize" it by removing whitespace and converting to lower case
blacklist.add("Call for info".toLowerCase().replaceAll("\\s", ""));
blacklist.add("Travel".toLowerCase().replaceAll("\\s", ""));
blacklist.add("To be determined".toLowerCase().replaceAll("\\s", ""));
blacklist.add("Meals".toLowerCase().replaceAll("\\s", ""));
blacklist.add("Custom Call".toLowerCase().replaceAll("\\s", ""));
blacklist.add("Custom".toLowerCase().replaceAll("\\s", ""));
// obtain the input string and also "normalize" it
String input = getSomehow().toLowerCase().replaceAll("\\s", ""));
// now determine if any blacklisted words/expressions appear inside the input
for(String blItem : blacklist) {
if (input.contains(blItem)) {
throw new RuntimeException("IMPOSSSSSSSIBLE!")
}
}
I thought this was working great until my input
string contained the word "Customer
" inside of it.
Since custom
exists inside customer
, the program is throwing an exception. Instead, I want it to be allowed, because "customer" is not a blacklisted word.
So I think the actual logic here is:
[a-z]
) character...I think that would cover all my bases.
Does Java 8 or any (Apache or otherwise) "commons" library have anything that will help me here? For some reason I'm having a hard time wrapping my head around this and making the code look elegant (I'm not sure how to check for the beginning/ending of a string from inside a regex, etc.).
Any ideas?
You can pre-compile a list of Patterns for the given words.
\b
indicates a word boundary. Adding a word boundary on both sides of a String will match the regex for exact words.
List<Pattern> blackListPatterns =
blackList
.stream()
.map(
word -> Pattern.compile("\\b" + Pattern.quote(word) + "\\b")
).collect(Collectors.toList());
Then you can match the word with the Pattern List.
If you are sure your word will not contain any metacharacters like (
,*
.etc, you can directly create your Pattern from the String instead of using Pattern.quote()
, which is used to escape metacharacters.
for (Pattern pattern : blackListPatterns) {
if (pattern.matcher(input).find()) {
throw new RuntimeException("IMPOSSSSSSSIBLE!")
}
}