I'm trying to make search keywords bold in result titles by replacing each keyword with <b>kw</b>
using replaceAll()
method. Also need to ignore any special characters in keywords for highlight. This is the code I'm using but it is double replacing the bold directive in second pass. I am looking for a elegant regex solution since my alternative is becoming too big without covering all cases. For example, with this input:
addHighLight("a b", "abacus")
...I get this result:
<<b>b</b>>a</<b>b</b>><b>b</b><<b>b</b>>a</<b>b</b>>cus
public static String addHighLight(String kw, String text) {
String highlighted = text;
if (kw != null && !kw.trim().isEmpty()) {
List<String> tokens = Arrays.asList(kw.split("[^\\p{L}\\p{N}]+"));
for(String token: tokens) {
try {
highlighted = highlighted.replaceAll("(?i)(" + token + ")", "<b>$1</b>");
} catch ( Exception e) {
e.printStackTrace();
}
}
}
return highlighted;
}
Pattern.quote(token)
(unless non-regex-escaped kw
is guaranteed)replaceAll()
(instead of tokenizing input into tag|text|tag|text|...
and applying replace to texts only, which would've been much simpler and faster) - below code should helpNote that it's not efficient - it matches some empty or already-highlighted spots and thus requires "curing" after substitution, but should treat XML/HTML tags (except CDATA
) properly.
Here's a "curing" function (no null checks):
private static Pattern cureDoubleB = Pattern.compile("<b><b>([^<>]*)</b></b>");
private static Pattern cureEmptyB = Pattern.compile("<b></b>");
private static String cure(String input) {
return cureEmptyB.matcher(cureDoubleB.matcher(input).replaceAll("<b>$1</b>")).replaceAll("");
}
Here's how the replaceAll line should look like:
String txt = "[^<>" + Pattern.quote(token.substring(0, 1).toLowerCase()) + Pattern.quote(token.substring(0, 1).toUpperCase()) +"]*";
highlighted = cure(highlighted.replaceAll("((<[^>]*>)*"+txt+")(((?i)" + Pattern.quote(token) + ")|("+txt+"))", "$1<b>$4</b>$5"));