Search code examples
cbit-manipulationbit-shift

Purpose of integer literal suffix in left shift


In C, many operations employ bit shifting, in which an integer literal is often used. For example, consider the following code snippet:

#define test_bit(n, flag) (1UL << (n) & (flag))

As far as I know, the integer literal suffix UL is supposed to suppress unwanted behavior in a shift, e.g. sign-extending a signed integer may result in multiple bits being set. However, if the case is doing a left shift only, as shown above, do we still need the integer literal suffix?

As a left shift won't cause unintended behavior, I can't figure what its purpose is. Code like the above often appears in projects such as Linux kernel, which makes me think that there must be a need for it. Does anyone know the purpose of the UL suffix in this case?


Solution

  • Sign extending only applies to right shifts, so that's not applicable.


    << is defined as follows:

    C23 §6.5.7 ¶4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, wrapped around. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

    There are two ways in which left-shifting values can result in undefined behaviour based on E1:[1]

    • E1 has a signed type and negative value.
    • E1 has a signed type and nonnegative value, and E1 × 2E2 is unrepresentable.

    In our case, E1 is a positive value, so the former isn't applicable. However, the latter could apply depending on the type of E1.

    Let's look at what results we get for different types on two systems.

    • System "L" has a 32-bit int and a 64-bit long (e.g. Linux on x86-64).
    • System "W" has a 32-bit int and a 32-bit long (e.g. Windows on x86-64).
    Implementation Usage Result on "L" Result on "W"
    1 << (n) test_bit( 31, flag ) Undefined behaviour Undefined behaviour
    1L << (n) test_bit( 31, flag ) ok (since long is 64 bits) Undefined behaviour
    1U << (n) test_bit( 31, flag ) ok ok
    1U << (n) test_bit( 63, flag ) Incorrect result
    1L << (n) test_bit( 63, flag ) Undefined behaviour
    1UL << (n) test_bit( 63, flag ) ok

    So, assuming you want to be able to test any of the bits of flag

    • 1U is needed if flag can be a signed int or an unsigned int or shorter.
    • 1UL is needed if flag can also be a signed long or an unsigned long.

    1. Undefined behaviour can also result based on the value of E2. This happens if E2 is negative, equal to the width of E1, or greater than the width of E1. This puts a constraint on the valid values for test_bit's first argument.