I'm currently trying to run a project with Java 21 which currently runs with Java17 without any problems.
For some of our regex patterns, there are matches with Java21 that did not match in Java17 and the other way around.
It is reproducible with this simle code:
public static void main(String[] args) {
//english
test(
"...their sample variance, and σ2N their population variance.",
"(?<![A-Z\\$€£¥฿฿=]-?[0-9\\.]{0,5})((\\b|\\-)[0-9]{1,5}[0-9,.]{0,5}(€|¥|฿|฿|°C|°F|°De?|°R[éeøa]?|(Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|f|z|y)[ΩΩm]|[ΩΩ]|(Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)?N|[kKMGTPEZY]i?B|[kmµnp]g|[Mk]t|kWh|GWa|MWd|MWh)(?!\\w))",
true,
null);
//french
test(
"Il a été mis au banc de la société.",
"\\bau (banc) (?:des nations|de la (?:société|ville|communauté|France)|de l['´‘’′](?:Europe|empire|église|islam))\\b",
false,
"au banc de la société");
}
private static void test(String text, String regex, boolean caseSensitive, String expected) {
int flags = caseSensitive ? 0 : Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Pattern pattern = Pattern.compile(regex, flags);
Matcher matcher = pattern.matcher(text);
int start = 0;
String match = null;
while (matcher.find(start)) {
match = text.substring(matcher.start(), matcher.end());
start = matcher.end();
}
System.out.println("Expected: " + expected);
System.out.println("Got: " + match);
}
Output with Java 17:
Expected: null
Got: null
Expected: au banc de la société
Got: au banc de la société
Output with Java 21:
Expected: null
Got: 2N
Expected: au banc de la société
Got: null
Expect the same behaviour in Java 21 like in Java 17.
maybe " Regex \b Character Class Now Matches ASCII Characters only by Default (JDK-8264160)"; less probable "Support Unicode 14.0 (JDK-8268081)" (both from Release Notes Java 19); or "Support Unicode 15.0 (JDK-8284842)" (Release Notes Java 20) – user85421 2 days ago