Search code examples
javamatcherreplaceall

Java: String.replaceAll() vs matcher.replaceAll() in a loop


This is probably an incredibly simple question, as well as likely a duplicate (although I did try to check beforehand), but which is less expensive when used in a loop, String.replaceAll() or matcher.replaceAll()?
While I was told

Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
   matcher = regexPattern.matcher(Scanner.next());
   thisWord = matcher.replaceAll("");
   ...
} 

is better, because you only have to compile the regex once, I would think that the benefits of

String thisWord;
while (Scanner.hasNext()) {
   thisWord = Scanner.next().replaceAll("[^a-zA-Z0-9]","");
   ...
}

far outweigh the matcher method, due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)

Can someone please explain how my reasoning is false? Am I misunderstanding what Pattern.matcher() does?


Solution

  • In OpenJDK, String.replaceAll is defined as follows:

        public String replaceAll(String regex, String replacement) {
            return Pattern.compile(regex).matcher(this).replaceAll(replacement);
        }
    

    [code link]

    So at least with that implementation, it won't give better performance than compiling the pattern only once and using Matcher.replaceAll.

    It's possible that there are other JDK implementations where String.replaceAll is implemented differently, but I'd be very surprised if there were any where it performed better than Matcher.replaceAll.


    […] due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)

    I think you have a misunderstanding here. You really do create a new Matcher instance on each loop iteration; but that is very cheap, and not something to be concerned about performance-wise.


    Incidentally, you don't actually need a separate 'matcher' variable if you don't want one; you'll get exactly the same behavior and performance if you write:

       thisWord = regexPattern.matcher(Scanner.next()).replaceAll("");