Search code examples
coptimizationarmbitwise-operators

How "bitwise AND mask equals mask" can be optimized?


How "bitwise AND mask equals mask" can be optimized?

Example:

bool foo(uint64_t x)
{
      return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}

leads to (ARM 32-bit):

gcc 12.1 (linux) -O3:
f:
        movs    r3, #0
        movt    r3, 32752
        bics    r3, r3, r1
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr

armv7-a clang 11.0.1 -O3:
f:
        mov     r0, #267386880
        orr     r0, r0, #1879048192
        bic     r0, r0, r1
        rsbs    r1, r0, #0
        adc     r0, r0, r1
        bx      lr

Can the C code above be rewritten in such a way that a faster ASM code is produced?

Perhaps there are relevant bit twiddling hacks? Or their combinations? Or similar?


Solution

  • One option is

    bool foo4(uint64_t x)
    {
        return (((x << 1) >> 53) + 1) >> 11;
    }
    

    which compiles with gcc to

    foo:
            ubfx    r0, r1, #20, #11
            adds    r0, r0, #1
            ubfx    r0, r0, #11, #1
            bx      lr
    

    The saving here mostly comes from not having to convert to a 0/1 result but generating an 1 bit directly. If this function is inlined and the result is used for a branch, this is not helpful and might actually result in slower code.