I am trying to figure out what character set putchar uses. Seemingly, it cannot print multi-byte characters:
putchar('€') //gcc warning: multi-character character constant
But when the codepage of the terminal in Windows is set to 1252 (West European Latin) with chcp 1252, the following code is able to print the Euro sign:
putchar(128)
But still, even though the terminal's charset is set to 1252, putchar('€') cannot print the Euro sign.
Can anybody please explain the above (seeming) discrepancy to me?
Thank you very much.
char
in C for all practical purposes means "byte", not "character"
Your source file is most likely encoded in UTF-8, where the euro symbol is encoded as the following 3 bytes: 0xE2 0x82 0xAC.
putchar
, as the name implies, writes single bytes. C as a language has no notion of "characters" or "encodings", and GCC by default uses the exact bytes it found in the source file. So in your case it prints a byte 0xAC (the least significant byte of '€'
) to the standard output. It doesn't matter how it looks like in your editor or what encoding the file is supposed to be. GCC doesn't case, it copies bytes as-is.
What the terminal displays given the stream of bytes from a program, it depends solely on the settings of that terminal. If you want to display UTF-8 encoded text in Windows terminal, you should enter chcp 65001
and change the font to Lucida.
Since your editor displays the bytes according to a specified encoding, and a terminal displays the same bytes using some encoding, then (as long as you use GCC or Clang with default settings) if the editor and terminal use the same encoding, you should see the same characters in both programs.
EDIT: Few remarks about how GCC handles encodings:
There are two options: -finput-charset
and -fexec-charset
. GCC treats bytes in narrow string and char literals literally only if those two options are identical. If they are not, GCC converts them from input encoding to exec encoding.
After a bit of testing, I conclude that for some reason your GCC runs with Windows-1250 as input encoding and UTF-8 as exec encoding.
If you want to make really really sure you are using the right encoding, add -finput-charset=cp1250 -fexec-charset=cp1250
to compiler options.
Also, this way you can make your program run in the default encoding of your console if you so desire.