I expect the multiply_by_pow_of_2
function below to have a more optimized code for multiplication since I used the [[ assume( std::has_single_bit( multiplier ) ) ]];
which is supposed to tell the compiler that 2nd parameter multiplier
is always a power of 2. So maybe it could generate faster code. However it generates the same code (using imul
):
multiply_by_pow_of_2(unsigned int, unsigned int):
mov eax, esi
imul eax, edi
ret
With that said, calling this function in main and comparing its generated code with that of argc * 8
shows that both change to the usual lea
(and not to something like a shift left):
lea esi, [0+rdi*8]
Here (live):
#include <bit>
#include <cstdio>
unsigned multiply_by_pow_of_2( unsigned num, unsigned multiplier ) noexcept
{
[[ assume( std::has_single_bit( multiplier ) ) ]];
return num * multiplier;
}
int main( int argc, char* argv[ ] )
{
unsigned result;
if constexpr ( true )
{
result = multiply_by_pow_of_2( argc, 8 );
}
else
{
result = argc * 8;
}
std::printf( "%d\n", result );
}
which gives (using -O3
):
multiply_by_pow_of_2(unsigned int, unsigned int):
mov eax, esi
imul eax, edi
ret
.LC0:
.string "%d\n"
main:
sub rsp, 8
lea esi, [0+rdi*8] ; <- here
xor eax, eax
mov edi, OFFSET FLAT:.LC0
call printf
xor eax, eax
add rsp, 8
ret
The function's code remains the same with or without the assumption statement. Am I failing to convince the compiler that multiplier
is a power of 2? How can this be done?
First, multiplication instructions are so fast on current CPUs that it isn’t necessarily a big win to use shifts over multiplies.
Second, to replace a multiplication by a shift, it isn’t enough to know that the operand has a single bit — you need to know at what bit position (say b) that bit is actually placed, so you can perform a left shift by b positions. If the compiler can’t infer that information, it’d need to e.g. emit a count trailing zeros instruction to find b, followed by the shift. And that’d be almost certainly more expensive than just the multiply.
On the other hand, if the input is a constant, then the compiler can compute b in compile-time and emit the multiplication-less code as you witnessed.