Search code examples
x86bit-manipulationbit

Bit Hack to get location of bit change


I am needing to figure out a bit-hack to find the first location where the bit-value changes. It could be from 0 to 1 or 1 to 0.

// This would give 0 since no changes happen
let x1 = 0b00000000

// This also gives 0 since there is no change
let x2 = 0b11111111

// This would give 1 since the value changes after the first bit
let x3 = 0b10000000

// This would also give 1 since the value change 
// happens at the first bit
let x4 = 0b01111111

// This would return 7 since the change is at the 7th bit
let x5 = 0b00000001

// This would also return 7
let x6 = 0b11111110

Any recommendations on an incantation that would give these results?


Solution

  • The key building-block is a bit-scan like bsr. (Or 31-lzcnt).
    That finds the position of the highest 1 bit, and we can transform your input to work for this.

    If the leading bit is set, NOT it before bit-scan. Like x = ((int32_t)x<0) ? ~x : x; with mov ecx, eax / xor eax, -1 (like NOT but sets FLAGS) / cmovns ecx, eax. So like an absolute-value idiom, but with NOT instead of NEG . Another option is sar reg,31 (or cdq) / xor with that to flip or not.

    #include <bit>          // C++20
    #include <stdint.h>
    
    int transition_position(uint32_t x)
    {
        x = ((int32_t)x<0) ? ~x : x;            // make the leading bits zero
        if (x == 0) __builtin_unreachable();    // optional, x|=1 is another way to handle the case where all bits are the same, allowing BSF without branching
        return std::countl_zero(x) ^ 31;        // 31-lzcnt in a way that GCC can optimize back to bsf
    }
    

    This compiles nicely with GCC / clang: https://godbolt.org/z/63nEcnTso

    gcc11.2 and clang13 -O3 both make this asm
    transition_position(unsigned int):
            mov     eax, edi
            sar     eax, 31            # missed optimization: should shift EDI so mov is off the critical path on CPUs without mov-elimination
            xor     eax, edi
            bsr     eax, eax
            ret
    

    With -march=haswell, they both de-optimize by using lzcnt and then an actual xor eax,31. (That asm is still better for AMD, where bsf is significantly slower than tzcnt/lzcnt, so it's good for -mtune=generic -mbmi, but not for -march=haswell)