Search code examples
c#unicodeencodingcharacter-encodingutf-16

Defining 4-byte UTF-16 character in a string


I have read a question about UTF-8, UTF-16 and UCS-2 and almost all answers give the statement that UCS-2 is obsolete and C# uses UTF-16.

However, all my attempts to create the 4-byte character U+1D11E in C# failed, so I actually think C# uses the UCS-2 subset of UTF-16 only.

There are my tries:

string s = "\u1D11E"; // gives the 2 character string "ᴑE", because \u1D11 is ᴑ
string s = (char) 0x1D11E; // won't compile because of an overflow
string s = Encoding.Unicode.GetString(new byte[] {0xD8, 0x34, 0xDD, 0x1E}); // gives 㓘ờ

Are C# strings really UTF-16 or are they actually UCS-2? If they are UTF-16, how would I get the violin clef into my C# string?


Solution

  • Use capital U instead:

      string s = "\U0001D11E";
    

    And you overlooked that most machines are little-endian:

      string t = Encoding.Unicode.GetString(new byte[] { 0x34, 0xD8, 0x1E, 0xDD });