Let's say I'm constructing a set of Strings where each String is a prefix of the next one. For example, imagine I write a function:
public Set<String> example(List<String> strings) {
Set<String> result = new HashSet<>();
String incremental = "";
for (String s : strings) {
incremental = incremental + ":" + s;
result.add(incremental);
}
return result;
}
Would it ever be worthwhile to rewrite it to use a StringBuilder rather than concatenation? Obviously that would avoid constructing a new StringBuilder in each iteration of the loop, but I'm not sure whether that would be a significant benefit for large lists or whether the overhead that you normally want to avoid by using StringBuilders in loops is mostly just the unnecessary String constructions.
This answer is only correct for Java 8; as @user85421 points out, +
on strings is no longer compiled to StringBuilder
operations in Java 9 and later.
Theoretically at least, there is still a reason to use a StringBuilder
in your example.
Let's consider how string concatenation works: the assignment incremental = incremental + ":" + s;
actually creates a new StringBuilder
, appends incremental
to it by copying, then appends ":"
to it by copying, then appends s
to it by copying, then calls toString()
to build the result by copying, and assigns a reference to the new string to the variable incremental
. The total number of characters copied from one place to another is (N + 1 + s.length()) * 2
where N
is the original length of incremental
, because of copying every character into the StringBuilder
's buffer once, and then back out again once.
In contrast, if you use a StringBuilder
explicitly - the same StringBuilder
across all iterations - then inside the loop you would write incremental.append(":").append(s);
and then explicitly call toString()
to build the string to add to the set. The total number of characters copied here would be (1 + s.length()) * 2 + N
, because the ":"
and s
have to be copied in and out of the StringBuilder
, but the N
characters from the previous state only have to be copied out of the StringBuilder
in the toString()
method; they don't also have to be copied in, because they were already there.
So, by using a StringBuilder
instead of concatenation, you are copying N
fewer characters into the buffer on each iteration, and the same number of characters out of the buffer. The value of N
grows from initially 0, to the sum of the lengths of all of the strings (plus the number of colons), so the total saving is quadratic in the sum of the lengths of the strings. That means the saving could be quite significant; I'll leave it to someone else to do the empirical measurements to see how significant it is.