Search code examples
c#.netuwpsurrogate-pairs

Convert from '\ud835' format to "𝓛" in c# [UWP]


I have a string with some wonky characters (for example) "𝓛𝓮𝓪𝓭 𝓑𝓪𝓬𝓴𝓮𝓷𝓭". I need to check if a List contains the first item in the string. But if I index it, it always becomes \ud835. After using Char.ConvertFromUtf32(\ud835) and some other attempts, I simply can't find out how to get the first item as a "𝓛".


Solution

  • 𝓛 is represented with a surrogate pair in UTF-16, the encoding used by .NET.

    A surrogate pair is represented with two characters:

            var s = "𝓛𝓮𝓪𝓭 𝓑𝓪𝓬𝓴𝓮𝓷𝓭";
            Console.WriteLine(new string(new[] { s[0], s[1] }) == "𝓛");
    

    There are built-in helper methods like Char.ConvertToUtf32 and Char.IsSurrogate which you can use to figure out if you are in this situation.