Search code examples
castingarmunsigned

Bad value affectation after type casting


I am using a native unsigned long variable as a buffer used to contain two unsigned short variable inside it. From my knowledge of C++ it should be a valid method. I used this method to store 2 unsigned char inside one unsigned short many times without any problem. Unfortunately when using it on a different architecture, it react strangely. It seems to update the value after a second assignation. The (Overflow) case is there simply to demonstrate it. Can anyone shed some light on why it react that way?

unsigned long dwTest = 0xFFEEDDCC;

printf("sizeof(unsigned short) = %d\n", sizeof(unsigned short));
printf("dwTest = %08X\n", dwTest);

//Address + values
printf("Addresses + Values: %08X <- %08X, %08X <- %08X\n", (DWORD)(&((unsigned short*)&dwTest)[0]), (((unsigned short*)&dwTest)[0]), (DWORD)(&((unsigned short*)&dwTest)[1]), (((unsigned short*)&dwTest)[1]) );

((unsigned short*)&dwTest)[0] = (WORD)0xAAAA;
printf("dwTest = %08X\n", dwTest);

((unsigned short*)&dwTest)[1] = (WORD)0xBBBB;
printf("dwTest = %08X\n", dwTest);

//(Overflow)
((unsigned short*)&dwTest)[2] = (WORD)0x9999;

printf("dwTest = %08X\n", dwTest);

Visual C++ 2010 output (OK):

sizeof(unsigned short) = 2
dwTest = FFEEDDCC
Addresses + Values: 0031F728 <- 0000DDCC, 0031F72A <- 0000FFEE

dwTest = FFEEAAAA

dwTest = BBBBAAAA

dwTest = BBBBAAAA

ARM9 GCC Crosstool output (Doesn't work):

sizeof(unsigned short) = 2
dwTest = FFEEDDCC
Addresses + Values: 7FAFECD8 <- 0000DDCC, 7FAFECDA <- 0000FFEE

dwTest = FFEEDDCC

dwTest = FFEEAAAA

dwTest = BBBBAAAA

Solution

  • What you are trying to do is called type-punning. There are two traditional ways to do it.

    A way to do it is via pointers (what you have done). Unfortunately, this conflicts with the optimizer. You see, due to the halting problem, the optimizer cannot know in the general case that two pointers don't alias each other. This means that the compiler has to reload any value that may have been modified via a pointer, resulting in tons of potentially unnecessary reloads.

    So, the strict-aliasing rule was introduced. It basically says that two pointers can only alias each other when they are of the same type. As a special rule, a char * can alias any other pointer (but not the other way around). This breaks type-punning via pointers, and lets the compiler generate more efficient code. When gcc detects type-punning and has warnings enabled, it will warn you thus:

    warning: dereferencing type-punned pointer will break strict-aliasing rules
    

    Another way to do type-punning is via the union:

    union {
        int i;
        short s[2];
    } u;
    u.i = 0xDEADBEEF;
    u.s[0] = 0xBABE;
    ....
    

    This opens up a new whole can of worms. In the best case, this is implementation dependant. Now, I don't have access to the C89 standard, but in C99 it originally stated that the value of an union member other than the last one stored into is unspecified. This was changed in a TC to state that the values of bytes that don't correspond to the last stored-into member are unspecified, and stated otherwise that the bytes that do correspond to the last stored-into member are reinterpreted as per the new type (something which is obviously implementation dependant).

    For C++, I can't find the language about the union hack in the standard. Anyways, C++ has reinterpret_cast<>, which is what you should use for type-punning in C++ (use the reference variant of reinterpret_cast<>).

    Anyways, you probably shouldn't be using type-punning (implementation-dependant), and you should build up your values manually via bit-shifting.