Search code examples
c#stringpointersunsafe

When is it safe to use 'unsafe string modifications' in C#?


private const int RESULT_LENGTH = 10;

public static unsafe string Encode1(byte[] data)
{
    var result = new string('0', RESULT_LENGTH); // memory allocation

    fixed (char* c = result)
    {
        for (int i = 0; i < RESULT_LENGTH; i++)
        {
            c[i] = DetermineChar(data, i);
        }
    }

    return result;
}


public static string Encode2(byte[] data)
{
    var chars = new char[RESULT_LENGTH]; // memory allocation

    for (int i = 0; i < RESULT_LENGTH; i++)
    {
        chars[i] = DetermineChar(data, i);
    }

    return new string(chars); // again a memory allocation
}

private static char DetermineChar(byte[] data, int index)
{
    // dummy algorithm.
    return 'a';
}

Both methods encode a byte array according some specific algorithm to a string. The first creates a string and writes to the single chars using pointers. The second creates an array of chars and eventually uses that array to instantiate a string.

I know strings are immutable and that multiple string declarations can point to the same allocated memory. Also, according to this article, you should not use unsafe string modifications unless it is absolutely necessary.

My question: When is it safe to use 'unsafe string modifications' as used in the Encode1 sample code?

PS. I'm aware of newer concepts as Span and Memory, and the string.Create method. I'm just curious about this specific case.

Edit

Thank you for all your responses. Maybe the word 'safe' in my question was more confusing than it did any good. I didn't meant it as an opposite of the unsafe keyword but in a vernacular sense.


Solution

  • Ultimately, the only time this is "safe" (in the vernacular sense, not in the unsafe sense) is when you own the string and it has not yet been exposed to any external code who may expect it to be immutable. The only time it is common to see this scenario is when you're constructing a new string and you can't just use the GetString methods on an Encoding - for example, because the source data is discontiguous and may span multiple Encoder steps.

    So basically, the scenario shown in Encode1 where it allocates a new string with a known length, then immediately overwrites the character data is the only reasonable usage. Once the string is in the wild: leave it immutable.

    However, if you even remotely can avoid it: I would. It definitely makes sense in the context of Encode1, but...

    One scenario to be especially cautious off: interned strings (constants, literals, etc); you don't own these.