Search code examples
javaregexstack-overflow

Java regex match stack overflow


Pattern eqPattern = Pattern.compile("(.*?)([a-z0-9\\_\\.]*) eq \"(((\\\\\")|[^\"])*)\"([\\s]*.*)", Pattern.CASE_INSENSITIVE);

This is my regex. When I try to match a long string, I got stack overflow. The pattern will match something like column1 eq "abc" and column ne "abc"; (\\\")|[^\"]): to skip " in inside "". I want to ask how to rewrite this to prevent stack overflow.


Solution

  • The best approach is to remove the alternation from the regex.
    That can be done like this, which uses the unrolled loop instead:

    "(.*?)([\\w.]*) eq \"([^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*)\"(\\s*.*)"

    Raw and Expanded

     ( .*? )                       # (1)
     ( [\w.]* )                    # (2)
     [ ] eq [ ] 
     "
     (                             # (3 start)
          [^"\\]* 
          (?:
    
               \\ [\S\s] 
               [^"\\]* 
          )*
     )                             # (3 end)
     "
     ( \s* .* )                    # (4)