Search code examples
c++x86-64compiler-optimizationc++23

[[assume]] attribute not having an impact on code


I expect the multiply_by_pow_of_2 function below to have a more optimized code for multiplication since I used the [[ assume( std::has_single_bit( multiplier ) ) ]]; which is supposed to tell the compiler that 2nd parameter multiplier is always a power of 2. So maybe it could generate faster code. However it generates the same code (using imul):

multiply_by_pow_of_2(unsigned int, unsigned int):
        mov     eax, esi
        imul    eax, edi
        ret

With that said, calling this function in main and comparing its generated code with that of argc * 8 shows that both change to the usual lea (and not to something like a shift left):

        lea     esi, [0+rdi*8]

Here (live):

#include <bit>
#include <cstdio>


unsigned multiply_by_pow_of_2( unsigned num, unsigned multiplier ) noexcept
{
    [[ assume( std::has_single_bit( multiplier ) ) ]];
    return num * multiplier;
}

int main( int argc, char* argv[ ] )
{
    unsigned result;

    if constexpr ( true )
    {
        result = multiply_by_pow_of_2( argc, 8 );
    }
    else
    {
        result = argc * 8;
    }

    std::printf( "%d\n", result );
}

which gives (using -O3):

multiply_by_pow_of_2(unsigned int, unsigned int):
        mov     eax, esi
        imul    eax, edi
        ret
.LC0:
        .string "%d\n"
main:
        sub     rsp, 8
        lea     esi, [0+rdi*8] ; <- here
        xor     eax, eax
        mov     edi, OFFSET FLAT:.LC0
        call    printf
        xor     eax, eax
        add     rsp, 8
        ret

The function's code remains the same with or without the assumption statement. Am I failing to convince the compiler that multiplier is a power of 2? How can this be done?


Solution

  • First, multiplication instructions are so fast on current CPUs that it isn’t necessarily a big win to use shifts over multiplies.

    Second, to replace a multiplication by a shift, it isn’t enough to know that the operand has a single bit — you need to know at what bit position (say b) that bit is actually placed, so you can perform a left shift by b positions. If the compiler can’t infer that information, it’d need to e.g. emit a count trailing zeros instruction to find b, followed by the shift. And that’d be almost certainly more expensive than just the multiply.

    On the other hand, if the input is a constant, then the compiler can compute b in compile-time and emit the multiplication-less code as you witnessed.