Search code examples
c#.netencodingasciinon-ascii-characters

Encoding.GetEncoding("Cyrillic") making all text question marks in .NET


Why is the txt 𝗜𝗡𝗗𝗢𝗢𝗥 𝗦𝗢𝗙𝗧𝗕𝗔𝗟𝗟 𝗧𝗢𝗨𝗥𝗡𝗔𝗠𝗘𝗡𝗧 𝗗𝗜𝗔𝗠𝗢𝗡𝗗 𝗝𝗔𝗫𝗫 𝗔𝗡𝗗 𝗛𝗜𝗧𝗭 being converted to the txt with the method below?

???????????? ???????????????? ???????????????????? ?????????????? ???????? ?????? ????????

This I believe did not happen before but I just saw it doing it. I am using .NET 4.8.

 public static string RemoveAccent(this string txt)

    {
        if(txt == null)
        return txt;

        byte[] bytes = Encoding.GetEncoding("Cyrillic").GetBytes(txt);
        return Encoding.ASCII.GetString(bytes);
    }

Solution

  • The text was in some sort of Unicode encoding and why it was acting differently then before with ASCII encoded text. So I did this below before the GetEncoding and it works now.

    if(!txt.IsNormalized(NormalizationForm.FormKD))
                {
                    txt= txt.Normalize(NormalizationForm.FormKD);
                }