Search code examples
c#utf-8character-encodingascii

Why is the console not printing the characters i am expecting


I'm currently trying to educate my self about the different Encoding types. I tried to make a simple console app to tell me the difference between the types.

byte[] byteArray = new byte[] { 125, 126, 127, 128, 129, 130, 250, 254, 255 };
string s = Encoding.Default.GetString(byteArray);
Console.OutputEncoding = Encoding.Default;
Console.WriteLine("Default: " + s);

s = Encoding.ASCII.GetString(byteArray);
Console.OutputEncoding = Encoding.ASCII;
Console.WriteLine("ASCII: " + s);

s = Encoding.UTF8.GetString(byteArray);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine("UTF8: " + s);

The output however is nothing like I expected it to be.

Default: }~€‚úûüýþÿ
ASCII: }~?????????
UTF8: }~���������

Hmm... the characters do not copy well from the console output to here either so here's a print screen.

Console output printscreen

What I do expect is to see the the extended ASCII characters. The default encoding is almost correct but it cannot display 251, 252 and 253 but that might be a shortcoming on the Console.writeLine() though i'd not expect that.

enter image description here

The representation of the variable when debugging is as follows:

Default encoded string = "}~€‚úûüýþÿ"
ASCII encoded string = "}~?????????"
UTF8 encoded string = "}~���������"

Can someone tell me what I'm doing wrong? I expect one of the encoding types to properly display the extended ASCII table but apparently none can...

A bit of context:
I am trying to determine what Encoding would be best a standard in our company, I personally think UTF8 will do but my supervisor would like to see some examples before we decide.

Obviously we know we will need to use other encoding types every now and then (serial communication for example uses 7-bits so we can't use UTF8 there) but in general we would like to stick with one encoding type. Currently we are using default, ASCII and UTF8 at random so that's not a good thing.

EDIT
The output according to:

Console.WriteLine("Default: {0} for {1}", s, Console.OutputEncoding.CodePage);

output with code page

Edit 2:
Since I thought there might not be an encoding in which the extended ascii characters correspond to the decimal numbers in the table I linked to I turned it around and this:

char specialChar = '√';
int charNumber = (int)specialChar;

gives me the number: 8730 which in the table is 251


Solution

  • The output encoding in your case should be mostly irrelevant since you're not even working with Unicode. Furthermore, you need to change your console window settings from Raster fonts to a TrueType font, like Lucida Console or Consolas. When the console is set to raster fonts, you can only have the OEM encoding (CP850 in your case), which means Unicode doesn't really work at all.

    However, all that is moot as well, since your code is ... weird, at best. First, as to what is happening here: You have a byte array, interpret that in various encodings and get a (Unicode) string back. When writing that string to the console, the Unicode characters are converted to their closest equivalent in the codepage of the console (850 here). If there is no equivalent, not even close, then you'll get a question mark ?. This happens most prominently with ASCII and characters above 127, because they simply don't exist in ASCII.

    If you want the characters you want to see, then either use correct encodings throughout instead of trying to meddle around until it somewhat works, or just use the right characters to begin with.

    Console.WriteLine("√ⁿ²")
    

    should actually work because it runs through the encoding translation processes described above.