Search code examples
javaregexjava-9backtracking

Catastrophic Backtracking regular expression example in Java 9+


Does anyone have an example of a catastrophic backtracking in a regular expression, which works in Java 11? Most of the usual examples (like "(a+a+)+b") are fixed since java 9. The best would be one without back-references, no idea if it's possible in JDK 9+.

In our application we have a logic to control such backtracking, and to test that logic we used the expression "(x+x+)+y". After upgrading to JDK 11 it no longer causes the need behavior.


Solution

  • I am running java 14 and the backtracking issue still exists. It takes the following about 6 seconds to fail.

    String regex = "(a+a+a+a+a+a+)+b";
    String str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac";
    System.out.println(str.matches(regex));
    

    This can be prevented by using independent group(s) (?>X). Once the group matches the first time, the engine checks to see if the following matches the b. If it doesn't, no need to repeat as backtracking will not change that outcome. So it fails instantly (even for larger expressions and/or test strings).

    String regex1 = "(?>(a+a+a+a+a+a+)+)b";
    System.out.println(str.matches(regex1));