Search code examples
c#stringunicodeastral-plane

string and 4-byte Unicode characters


I have one question about strings and chars in C#. I found that a string in C# is a Unicode string, and a char takes 2 bytes. So every char is in UTF-16 encoding. That's great, but I also read on Wikipedia that there are some characters that in UTF-16 take 4 bytes.

I'm doing a program that lets you draw characters for alphanumerical displays. In program there is also a tester, where you can write some string, and it draws it for you to see how it looks.

So how I should work with strings, where the user writes a character which takes 4 bytes, i.e. 2 chars. Because I need to go char by char through the string, find this char in the list, and draw it into the panel.


Solution

  • You you could do:

    for( int i = 0; i < str.Length; ++i ) {
        int codePoint = Char.ConvertToUTF32( str, i );
        if( codePoint > 0xffff ) {
            i++;
        }
    }
    

    Then the codePoint represents any possible code point as a 32 bit integer.