After reading the comments, thanks to @M.M and @AnttiHaapala I fixed my code but still got incorrect outputs...
New Code:
#include <iostream>
int main() {
char * myChar;
myChar = new char[2];
myChar[1] = 0x00;
myChar[0] = 0xE0;
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
return 0;
}
Output:
65504
or if you reverse the order
57344
So I have a two byte value that I am reading from a file and would like to convert to a unsigned short so I can use the numerical value.
Example code:
#include <iostream>
int main() {
char myChar[2];
myChar[1] = 'à';
myChar[0] = '\0';
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
return 0;
}
Output:
40960
But à\0
or E0 00
should have a value of 224 as an unsigned two byte value?
Also very interesting...
This code:
include <iostream>
int main() {
char * myChar;
myChar = "\0à";
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
return 0;
}
Outputs:
49920
NOTE: The original code has a complicating factor in that the source is UTF-8 encoded. Please check edit history of this answer to see my comments on that. However I think that is not the main issue you are asking about, so I have changed my answer to just address the edit. To avoid UTF-8 conversion issues, use '\xE0'
instead of 'à'
.
Regarding the edited code:
char * myChar;
myChar = new char[2];
myChar[1] = 0x00;
myChar[0] = 0xE0;
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
The range of char
(on your system) is -128
through to 127
. This is common. You write myChar[0] = 224;
. (0xE0
is an int
literal with value 224
).
This is an out of range conversion, which causes implementation-defined behaviour. Most commonly, implementations will define this to adjust modulo 256 until the value is in range. So you end up with the same result as:
myChar[0] = -32;
Then the calculation (myChar[1] << 8) | myChar[0]
is 0 | (-32)
, which is -32
. Finally, you convert to unsigned short
. This is another out-of-range conversion, because the range of unsigned short
is [0, 65535]
on your system.
However, out-of-range conversion to unsigned type is well-defined to adjust modulo 65536
in this case, so the result is 65536 - 32 = 65504
.
Reversing the order performs ((-32) << 8) | 0
. Left-shifting a negative value causes undefined behaviour, although on your system it has manifested itself as doing -32 * 256
, giving -8192
. Converting that to unsigned short
gives 65536 - 8192 = 57344
.
If you are trying to get 224
from the first example, the simplest way to do this is to use unsigned char
instead of char
. Then myChar[0]
will hold the value 224
instead of the value -32
.