#include <stdio.h>
wchar_t wc = L' 459';
printf("%d", wc); //result : 32
I know the 'space' is 'decimal 32' in ASCII code table.
What I don't understand is, as far as I know, if there's not enough space for a variable to store value, the value would be the 'last digits' of the original value.
Like, if I put binary value '1100 1001 0011 0110' into single byte variable, it would be '0011 0110' which is 'the last byte' of the original binary value.
But the code above shows 'the first byte' of the original value.
I'd like to know what happen in memory level when I execute the code above.
_int64 x = 0x0041'0042'0043'0044ULL;
printf("%016llx\n", x); //prints 0041004200430044
wchar_t wc;
wc = x;
printf("%04X\n", wc); //prints 0044 as you expect
wc = L'\x0041\x0042\x0043\x0044'; //prints 0041, uses the first character
printf("%04X\n", wc);
If you assign an integer value that's too large, the compiler takes the max value 0x0044
that fits in 2 bytes.
If you try to assign several elements in to one element, the compiler takes the first element 0x0041
which fits. L'x'
is mean to be a single wide character.
VS2019 will issue a warning for wchar_t wc = L' 459'
, unless warning level is set to less than 3, but that's not recommended. Use warning level 3 or higher.
wchar_t
is a primitive type, not a typedef
for unsigned short
, but they are both 2 bytes in Windows (4 bytes in linux)
Note that 'abcd'
is 4 bytes. The L
prefix indicates 2 bytes per element (in Windows), so L'abcd'
is 8 bytes.
To see what is inside wc
, lets look at Unicode character L'X'
which has UTF-16 encoding of 0x0058
(similar to ASCII values up to 128)
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
wchar_t wc = L'X';
wprintf(L"%c\n", wc);
char buf[256];
memcpy(buf, &wc, 2);
for (int i = 0; i < 2; i++)
printf("%02X ", buf[i] & 0xff);
printf("\n");
return 0;
}
The output will be 58 00
. It is not 00 58
because Windows runs on little-endian systems and the bytes are flipped.
Another weird thing is that UTF16 uses for 4 bytes for some code points. So you will get a warning for this line:
wchar_t wc = L'😀';
Instead you want to use string:
wchar_t *wstr = L"😀";
::MessageBoxW(0, wstr, 0, 0); //console may not display this correctly
This string will be 6 bytes (2 elements + null terminating char)