I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters. The function is to be written to optimize performance when it is called thousands of times per second.
I know that .net has a function called String.Join
, to which I may pass in the space character as a separator along with the original string.
Barring the use of String.Join
, I can use the StringBuilder
class to append spaces after each character.
Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces). The character array can be filled in a loop and then passed to the String constructor
.
I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World"
and measures how long it takes to execute. Method (3) is much, much faster than (1) or (2).
I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join
should yield good performance. Why is using String.Join
so much slower than doing the work by hand?
public static class TestClass
{
// 491 milliseconds for 1 million iterations
public static string Space1(string s)
{
return string.Join(" ", s.AsEnumerable());
}
//190 milliseconds for 1 million iterations
public static string Space2(string s)
{
if (s.Length < 2)
return s;
StringBuilder sb = new StringBuilder();
sb.Append(s[0]);
for (int i = 1; i < s.Length; i++)
{
sb.Append(' ');
sb.Append(s[i]);
}
return sb.ToString();
}
// 50 milliseconds for 1 million iterations
public static string Space3(string s)
{
if (s.Length < 2)
return s;
char[] array = new char[s.Length * 2 - 1];
array[0] = s[0];
for (int i = 1; i < s.Length; i++)
{
array[2*i-1] = ' ';
array[2*i] = s[i];
}
return new string(array);
}
Update: I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.
Why is using String.Join so much slower than doing the work by hand?
The reason String.Join
is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T>
.
String.Join<T>(string, IEnumerable<T>)
(the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.
Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.
In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join
. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.
You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.