Search code examples
c#unicodearraysbit-shiftutf-16

Getting UTF-16 bytes without calling Encoding.Unicode.GetBytes


I want to populate part of a buffer with the bytes from a UTF-16 encoded string without allocating an intermediate byte array (i.e. without calling Encoding.Unicode.GetBytes(str))

Assuming that I know that a string contains only ASCII characters, is the following code safe?

        for (var i = 0; i < str.Length; i++)
        {
            var code = char.ConvertToUtf32(str, i);

            var high = (byte) (code & 0xFF);
            var low = (byte) ((code >> 8) & 0xFF);

            //k is the offset where we insert bytes of the UTF-16 encoded string

            buffer[i*2 + k] = high;
            buffer[i*2 + k + 1] = low;
        }

Solution

  • There is an overload of GetBytes that does what you want.

    Encoding.Unicode.GetBytes(str, 0, str.Length, buffer, k);
    // or, if you need to do something more complicated inside the `for` loop
    for (var i = 0; i < str.Length; i++)
    {
        Encoding.Unicode.GetBytes(str, i, 1, buffer, i*2 + k);
    }