Search code examples
javaregexperformance

How to improve the regex performance in java


I have this code to convert the whole text that is before "=" to uppercase

Matcher m = Pattern.compile("((?:^|\n).*?=)").matcher(conteudo);
while (m.find()) {
  conteudo = conteudo.replaceFirst(m.group(1), m.group(1).toUpperCase());
}

But when the string is too large, it becomes very slow, I want to find a faster way to do that.

Any sugestions?

EDIT

I haven't explained right. I have a text like this

field=value
field2=value2
field3=value3

And I want to convert each line like this

FIELD=value
FIELD2=value2
FIELD3=value3

Solution

  • The fastest way to get regex to work fast is to not use regex. Regex was never meant to be and almost never is a good choice for performance-sensitive operations. (Further reading: Why are regular expressions so controversial?)

    Try using String class methods instead, or write a custom method doing what you want. Use a tokenizer with split on '=', and then use .toUpperCase() on the tailing part (what's after \n). Alternatively, just convert to char[] or use charAt() and traverse it manually, switching chars to upper after a newline and back to regular way after '='.

    For example:

    public static String changeCase( String s ) {
        boolean capitalize = true;
        int len = s.length();
        char[] output = new char[len];
        for( int i = 0; i < len; i++ ) {
          char input = s.charAt(i);
          if ( input == '\n' ) {
            capitalize = true;
            output[i] = input;
          } else if ( input == '=' ) {
            capitalize = false;
            output[i] = input;
          } else {
            output[i] = capitalize ? Character.toUpperCase(input) : input;
          }
        }
        return new String(output);
    }
    

    Method input:

    field=value\n
    field2=value2\n
    field3=value3
    

    Method output:

    FIELD=value\n
    FIELD2=value2\n
    FIELD3=value3
    

    Try it here: http://ideone.com/k0p67j

    PS (by Jamie Zawinski):

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.