Search code examples
coperatorslanguage-lawyerbit-shiftinteger-promotion

Why does it make a difference if left and right shift are used together in one expression or not?


I have the following code:

unsigned char x = 255;
printf("%x\n", x); // ff

unsigned char tmp = x << 7;
unsigned char y = tmp >> 7;
printf("%x\n", y); // 1

unsigned char z = (x << 7) >> 7;
printf("%x\n", z); // ff

I would have expected y and z to be the same. But they differ depending on whether a intermediary variable is used. It would be interesting to know why this is the case.


Solution

  • This little test is actually more subtle than it looks as the behavior is implementation defined:

    • unsigned char x = 255; no ambiguity here, x is an unsigned char with value 255, type unsigned char is guaranteed to have enough range to store 255.

    • printf("%x\n", x); This produces ff on standard output but it would be cleaner to write printf("%hhx\n", x); as printf expects an unsigned int for conversion %x, which x is not. Passing x might actually pass an int or an unsigned int argument.

    • unsigned char tmp = x << 7; To evaluate the expression x << 7, x being an unsigned char first undergoes the integer promotions defined in the C Standard 6.3.3.1: If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.

      So if the number of value bits in unsigned char is smaller or equal to that of int (the most common case currently being 8 vs 31), x is first promoted to an int with the same value, which is then shifted left by 7 positions. The result, 0x7f80, is guaranteed to fit in the int type, so the behavior is well defined and converting this value to type unsigned char will effectively truncate the high order bits of the value. If type unsigned char has 8 bits, the value will be 128 (0x80), but if type unsigned char has more bits, the value in tmp can be 0x180, 0x380, 0x780, 0xf80, 0x1f80, 0x3f80 or even 0x7f80.

      If type unsigned char is larger than int, which can occur on rare systems where sizeof(int) == 1, x is promoted to unsigned int and the left shift is performed on this type. The value is 0x7f80U, which is guaranteed to fit in type unsigned int and storing that to tmp does not actually lose any information since type unsigned char has the same size as unsigned int. So tmp would have the value 0x7f80 in this case.

    • unsigned char y = tmp >> 7; The evaluation proceeds the same as above, tmp is promoted to int or unsigned int depending on the system, which preserves its value, and this value is shifted right by 7 positions, which is fully defined because 7 is less than the width of the type (int or unsigned int) and the value is positive. Depending on the number of bits of type unsigned char, the value stored in y can be 1, 3, 7, 15, 31, 63, 127 or 255, the most common architecture will have y == 1.

    • printf("%x\n", y); again, it would be better t write printf("%hhx\n", y); and the output may be 1 (most common case) or 3, 7, f, 1f, 3f, 7f or ff depending on the number of value bits in type unsigned char.

    • unsigned char z = (x << 7) >> 7; The integer promotion is performed on x as described above, the value (255) is then shifted left 7 bits as an int or an unsigned int, always producing 0x7f80 and then right shifted by 7 positions, with a final value of 0xff. This behavior is fully defined.

    • printf("%x\n", z); Once more, the format string should be printf("%hhx\n", z); and the output would always be ff.

    Systems where bytes have more than 8 bits are becoming rare these days, but some embedded processors, such as specialized DSPs still do that. It would take a perverse system to fail when passed an unsigned char for a %x conversion specifier, but it is cleaner to either use %hhx or more portably write printf("%x\n", (unsigned)z);

    Shifting by 8 instead of 7 in this example would be even more contrived. It would have undefined behavior on systems with 16-bit int and 8-bit char.