Search code examples
javaregexstack-overflow

Pattern.matches() gives StackOverflowError


I'm using java's Pattern.matches to match a block of data to a regex. The block of data can be a single line or multiple lines. The problem is that once my data becomes more than 15 lines (typically more than 17-18 lines), i start getting stackoverflowerror. For data less than 15 lines the regex works fine.

The Regex is of this format:
domainname -> space -> , -> space -> number -> space -> , -> space -> number -> newline

String regex = "^(([a-zA-Z0-9][a-zA-Z0-9\\-]*\\.)+([a-zA-Z]{2,})\\s*,\\s*\\d+\\s*,\\s*\\d+(\\r?\\n)?)+$";

The data block i use to test against this regex is this

abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456
abc.com, 123, 456

This is the code:

String regex = "^(([a-zA-Z0-9][a-zA-Z0-9\\-]*\\.)+([a-zA-Z]{2,})\\s*,\\s*\\d+\\s*,\\s*\\d+(\\r?\\n)?)+$";
boolean valid = Pattern.matches(regex, data); //fails here

Solution

  • I can't tell you the reason for this error; the regex itself is fine and not subject to catastrophic backtracking or any other obvious error.

    Perhaps you can reduce the number of backtracking positions the regex engine saves by using possessive quantifiers (++ instead of +, *+ instead of *, {2,}+ instead of {2,} etc.). Also, you don't need the capturing groups (thanks Thomas), so I've changed them into non-capturing ones:

    "(?:(?:[a-zA-Z0-9][a-zA-Z0-9-]*+\\.)++([a-zA-Z]{2,}+)\\s*+,\\s*+\\d++\\s*+,\\s*+\\d++(\r?+\n)?+)++"
    

    This won't change the behaviour of the regex (except for the removal of the unnecessary anchors since you're using Pattern.matches()), but perhaps it helps avoid StackOverflows. I don't have a Java SDK installed, so I can't test it myself, though.