Bit-shifting unsigned longs in C

I found a bug in a piece of code I wrote, and have fixed it, but still can't explain what was happening. It boils down to this:

unsigned i = 1<<31;          // gives 21476483648 as expected
unsigned long l = 1<<31;     // gives 18446744071562067968 as not expected

I'm aware of a question here: Unsigned long and bit shifting wherein the exact same number shows up as an unexpected value, but there he was using a signed char which I believe led to a sign extension. I really can't for the life of me see why I'm getting an incorrect value here.

I'm using CLion on Ubuntu 18.04, and on my system an unsigned is 32 bits and a long is 64 bits.

Solution

In this expression:

1<<31

The value 1 has type int. Assuming an int is 32 bits wide, that means you're shifting a bit into the sign bit. Doing so is undefined behavior.

This is documented in section 6.5.7p4 of the C standard:

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

However, since you're on Ubuntu, which used GCC, the behavior is actually implementation defined. The gcc documentation states:

Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed >> acts on negative numbers by sign extension.

As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed << as undefined. However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such cases. They are also diagnosed where constant expressions are required.

So gcc in this case works directly on the representation of the values. This means that 1<<31 has type int and the representation 0x80000000. The value of this representation in decimal is ‭-2147483648‬.

When this value is assigned to an unsigned int, it is converted via the rules in section 6.3.1.3p2:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Since "one more than the maximum value" is ‭42949672956 for a 32 bit unsigned int This results in the int value -2147483648‬ being converted to the unsigned int value ‭42949672956 -2147483648 == 2147483648‬.

When 1<<31 is assigned to an unsigned long int which is 64 bit, "one more than the maximum value" is 18446744073709551616 so the result of the conversion is 18446744073709551616 -2147483648 == 18446744071562067968, which is the value you're getting.

To get the correct value, use the UL suffix to make the value unsigned long:

1UL<<31