I have a string with some wonky characters (for example) "𝓛𝓮𝓪𝓭 𝓑𝓪𝓬𝓴𝓮𝓷𝓭"
. I need to check if a List contains the first item in the string. But if I index it, it always becomes \ud835
. After using Char.ConvertFromUtf32(\ud835
) and some other attempts, I simply can't find out how to get the first item as a "𝓛".
𝓛 is represented with a surrogate pair in UTF-16, the encoding used by .NET.
A surrogate pair is represented with two characters:
var s = "𝓛𝓮𝓪𝓭 𝓑𝓪𝓬𝓴𝓮𝓷𝓭";
Console.WriteLine(new string(new[] { s[0], s[1] }) == "𝓛");
There are built-in helper methods like Char.ConvertToUtf32
and Char.IsSurrogate
which you can use to figure out if you are in this situation.