Search code examples
javaregexjava-17java-21

Java 21 vs Java 17 pattern matcher


I'm currently trying to run a project with Java 21 which currently runs with Java17 without any problems.

For some of our regex patterns, there are matches with Java21 that did not match in Java17 and the other way around.

It is reproducible with this simle code:

public static void main(String[] args) {
        //english
        test(
                "...their sample variance, and σ2N their population variance.",
                "(?<![A-Z\\$€£¥฿฿=]-?[0-9\\.]{0,5})((\\b|\\-)[0-9]{1,5}[0-9,.]{0,5}(€|¥|฿|฿|°C|°F|°De?|°R[éeøa]?|(Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|f|z|y)[ΩΩm]|[ΩΩ]|(Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)?N|[kKMGTPEZY]i?B|[kmµnp]g|[Mk]t|kWh|GWa|MWd|MWh)(?!\\w))",
                true,
                null);
        //french
        test(
                "Il a été mis au banc de la société.",
                "\\bau (banc) (?:des nations|de la (?:société|ville|communauté|France)|de l['´‘’′](?:Europe|empire|église|islam))\\b",
                false,
                "au banc de la société");
    }

    private static void test(String text, String regex, boolean caseSensitive, String expected) {
        int flags = caseSensitive ? 0 : Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
        Pattern pattern = Pattern.compile(regex, flags);
        Matcher matcher = pattern.matcher(text);
        int start = 0;
        String match = null;
        while (matcher.find(start)) {
            match = text.substring(matcher.start(), matcher.end());
            start = matcher.end();
        }
        System.out.println("Expected: " + expected);
        System.out.println("Got: " + match);
    }

Output with Java 17:

Expected: null
Got: null
Expected: au banc de la société
Got: au banc de la société

Output with Java 21:

Expected: null
Got: 2N
Expected: au banc de la société
Got: null

Expect the same behaviour in Java 21 like in Java 17.


Solution

  • maybe " Regex \b Character Class Now Matches ASCII Characters only by Default (JDK-8264160)"; less probable "Support Unicode 14.0 (JDK-8268081)" (both from Release Notes Java 19); or "Support Unicode 15.0 (JDK-8284842)" (Release Notes Java 20) – user85421 2 days ago