Search code examples
c#utf-8character-encodingdata-conversion

Which encoding replaces "í" with "\303 \255"?


Anyone knows which encoding is this one. They tell me this is UTF8 but I can't see how. This input:

aquí (notice the accent on the i)

shoud produce this:

aqu\303 \255

Seems this is based on this table https://www.acc.umu.se/~saasha/charsets/, but I can see how I can get the output suggested from a random user input string from .NET - of course without building this crazy conversion table.

Any ideas?


Solution

  • It is UTF8, and 303 255 octal is 195 173 decimal, these numbers probably look more familiar. See the dec and oct headers in the table you linked.

    There is no built-in type that's going to produce octal output for some characters - you'll have to decide which characters to "octal-escape" and which to keep.

    The following snippet produces the output you desired (without the extra space), and escapes data based on whether a character is within the ASCII set:

    string str = "aquí";
    StringBuilder output = new StringBuilder();
    for (int i = 0; i < str.Length; i++)
    {
        byte[] bytes = Encoding.UTF8.GetBytes(str.Substring(i, 1));
        if (bytes.Length == 1 && bytes[0] < 128)
        {
            output.Append(str[i]);
        }
        else
        {
            foreach (byte b in bytes)
            {
                output.Append(@"\" + Convert.ToString(b, 8));
            }
        }
    }
    
    string result = output.ToString();