Search code examples
c#allocation

When to use ReadOnlySpan<T> vs explicit / overloaded types


For the sake of the ensuing question, assume hypothetically that I want to convert a plain-text string into a hexadecimal string; for example, "Hello, World!" into "48656c6c6f2c20576f726c6421".

This requires two steps:

  1. Convert the plain text string to a byte array.
  2. Convert the byte array to a hexadecimal string.

For example:

string plainText = "Hello, World!";
byte[] bytes = Encoding.Default.GetBytes(plainText);
string hexadecimalString = Convert.ToHexString(bytes);

Next, I want to wrap this functionality into a utility function. Pre .NET Core 2.1 (and I'm surprised ReadOnlySpan<T> is that old), the most flexible approach would be to have overloads for string and char[]; for example:

public static string ToHexString(string value, Encoding? encoding = null) =>
    ToHexString(value.ToCharArray(), encoding);
    
public static string ToHexString(char[] value, Encoding? encoding = null) =>
  Convert.ToHexString((encoding ?? Encoding.Default).GetBytes(value));

Whilst this approach is relatively trivial, it's not particularly nice from an allocation perspective as three allocations are required:

  1. ToCharArray() will allocate a new char[].
  2. GetBytes will allocate a new byte[]
  3. ToHexString will allocate a new string.

Introducing ReadOnlySpan<T> has the potential to improve this by reducing the number of allocations (albeit only by 1), but also from a maintainability perspective, it requires fewer method overloads, due to string and char[] being implicitly convertible to ReadOnlySpan<char>; for example (in an ideal world):

public static string ToHexString(ReadOnlySpan<char> value, Encoding? encoding = null) =>
  Convert.ToHexString((encoding ?? Encoding.Default).GetBytes(value));

Ironically, whilst ReadOnlySpan<T> should theoretically reduce the number of allocations, the problem here is that GetBytes has no overload that takes ReadOnlySpan<char>, therefore, the only way to implement this is to reintroduce the allocation; for example:

public static string ToHexString(ReadOnlySpan<char> value, Encoding? encoding = null) =>
  Convert.ToHexString((encoding ?? Encoding.Default).GetBytes(value.ToArray()));

Currently, the only way to reduce allocations is to provide method overloads, and implement them all separately; for example:

public static string ToHexString(string value, Encoding? encoding = null) =>
  Convert.ToHexString((encoding ?? Encoding.Default).GetBytes(value));
    
public static string ToHexString(char[] value, Encoding? encoding = null) =>
  Convert.ToHexString((encoding ?? Encoding.Default).GetBytes(value));

Now we have methods that will result in fewer allocations as there are no conversions from string to char[] or ReadOnlySpan<char> to char[], but the tradeoff is that this results in a higher maintainability cost, as I now have to maintain two methods, and a lesser flexible API for the caller.

So, my question is, when is it right to use ReadOnlySpan<T> as a method parameter vs. using more explicit / overloaded types? Should the tradeoff be biased towards maintainability, and a robust API leaning on implicit type conversion, or towards performance? Are there any guidelines for this?


Solution

  • So, my question is, when is it right to use ReadOnlySpan as a method parameter vs. using more explicit / overloaded types? Should the tradeoff be biased towards maintainability, and a robust API leaning on implicit type conversion, or towards performance? Are there any guidelines for this?

    While this might be somewhat opinion based, there is a common saying: Premature optimization is the root of all evil.

    In this context it would mean to bias toward maintainability by default, since most code is not performance sensitive.

    But you should profile the code to find the parts that are performance sensitive, and optimize those parts. This might or might not include your ToHex function. While allocations may be more difficult to profile than performance, a profiler should tell you at least things like allocation rate and time spent in GC. You can also add manual instrumentation if you want more detailed data.

    I would also not put such a helper function in any public API, so you should only have your own code to worry about, greatly simplifying maintenance. For public APIs it is a bit more difficult, and you probably need to decide on a case by case basis if you want to provide multiple overloads or not, based on the expected usage patterns.