How "bitwise AND mask equals mask" can be optimized?
Example:
bool foo(uint64_t x)
{
return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}
leads to (ARM 32-bit):
gcc 12.1 (linux) -O3:
f:
movs r3, #0
movt r3, 32752
bics r3, r3, r1
ite eq
moveq r0, #1
movne r0, #0
bx lr
armv7-a clang 11.0.1 -O3:
f:
mov r0, #267386880
orr r0, r0, #1879048192
bic r0, r0, r1
rsbs r1, r0, #0
adc r0, r0, r1
bx lr
Can the C code above be rewritten in such a way that a faster ASM code is produced?
Perhaps there are relevant bit twiddling hacks? Or their combinations? Or similar?
One option is
bool foo4(uint64_t x)
{
return (((x << 1) >> 53) + 1) >> 11;
}
which compiles with gcc to
foo:
ubfx r0, r1, #20, #11
adds r0, r0, #1
ubfx r0, r0, #11, #1
bx lr
The saving here mostly comes from not having to convert to a 0/1 result but generating an 1 bit directly. If this function is inlined and the result is used for a branch, this is not helpful and might actually result in slower code.