This is probably an incredibly simple question, as well as likely a duplicate (although I did try to check beforehand), but which is less expensive when used in a loop, String.replaceAll()
or matcher.replaceAll()
?
While I was told
Pattern regexPattern = Pattern.compile("[^a-zA-Z0-9]");
Matcher matcher;
String thisWord;
while (Scanner.hasNext()) {
matcher = regexPattern.matcher(Scanner.next());
thisWord = matcher.replaceAll("");
...
}
is better, because you only have to compile the regex once, I would think that the benefits of
String thisWord;
while (Scanner.hasNext()) {
thisWord = Scanner.next().replaceAll("[^a-zA-Z0-9]","");
...
}
far outweigh the matcher
method, due to not having to initialize the matcher
every time. (I understand the matcher
exists already, so you are not recreating it.)
Can someone please explain how my reasoning is false? Am I misunderstanding what Pattern.matcher()
does?
In OpenJDK, String.replaceAll is defined as follows:
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
So at least with that implementation, it won't give better performance than compiling the pattern only once and using Matcher.replaceAll.
It's possible that there are other JDK implementations where String.replaceAll is implemented differently, but I'd be very surprised if there were any where it performed better than Matcher.replaceAll.
[…] due to not having to initialize the matcher every time. (I understand the matcher exists already, so you are not recreating it.)
I think you have a misunderstanding here. You really do create a new Matcher instance on each loop iteration; but that is very cheap, and not something to be concerned about performance-wise.
Incidentally, you don't actually need a separate 'matcher' variable if you don't want one; you'll get exactly the same behavior and performance if you write:
thisWord = regexPattern.matcher(Scanner.next()).replaceAll("");