Search code examples
javaregexdelay

Java RegEx Pattern takes too much time


I have a script ( Java String) that contains a lot of SQL and SQL operators like <>(not equal to), ++(increment operator), := (assignment operator) but all of them by default contain single or multiple spaces between them for examples <> will be < >, similarly ++ could be like + +, having multiple spaces between them.

I wrote a simple RegEx to remove the spaces which is mentioned below. For small scripts, there are no issues but for larger scripts, it takes a lot of time, sometimes 20-30 seconds.

Can you see any problem with this program and suggest me something better?

p = Pattern.compile("(: +=|! +=|- +-|< +>|> +=|< +=|\\+ +=|\\'|\\/ +\\*|\\* +\\/|\\| +\\||\\< +\\<|\\> +\\>)");
m = p.matcher(script);

while (m.find()) {
    script = script.replace(m.group(), m.group().replaceAll(" +", ""));
}

Please suggest how can I reduce the time.

Thanks


Solution

  • I fear the long regexp with unions is a bit hard on the matcher. Why not a simpler solution like this :

    str = str.replace(":( +)=", ":=");
    str = str.replace("!( +)=", "!=");
    str = str.replace("-( +)-", "--");
    //... etc
    

    It will be more legible and should be faster.