Bit Hack to get location of bit change

I am needing to figure out a bit-hack to find the first location where the bit-value changes. It could be from 0 to 1 or 1 to 0.

// This would give 0 since no changes happen
let x1 = 0b00000000

// This also gives 0 since there is no change
let x2 = 0b11111111

// This would give 1 since the value changes after the first bit
let x3 = 0b10000000

// This would also give 1 since the value change 
// happens at the first bit
let x4 = 0b01111111

// This would return 7 since the change is at the 7th bit
let x5 = 0b00000001

// This would also return 7
let x6 = 0b11111110

Any recommendations on an incantation that would give these results?

Solution

The key building-block is a bit-scan like bsr. (Or 31-lzcnt).
That finds the position of the highest 1 bit, and we can transform your input to work for this.

If the leading bit is set, NOT it before bit-scan. Like x = ((int32_t)x<0) ? ~x : x; with mov ecx, eax / xor eax, -1 (like NOT but sets FLAGS) / cmovns ecx, eax. So like an absolute-value idiom, but with NOT instead of NEG . Another option is sar reg,31 (or cdq) / xor with that to flip or not.

#include <bit>          // C++20
#include <stdint.h>

int transition_position(uint32_t x)
{
    x = ((int32_t)x<0) ? ~x : x;            // make the leading bits zero
    if (x == 0) __builtin_unreachable();    // optional, x|=1 is another way to handle the case where all bits are the same, allowing BSF without branching
    return std::countl_zero(x) ^ 31;        // 31-lzcnt in a way that GCC can optimize back to bsf
}

This compiles nicely with GCC / clang: https://godbolt.org/z/63nEcnTso

gcc11.2 and clang13 -O3 both make this asm
transition_position(unsigned int):
        mov     eax, edi
        sar     eax, 31            # missed optimization: should shift EDI so mov is off the critical path on CPUs without mov-elimination
        xor     eax, edi
        bsr     eax, eax
        ret

With -march=haswell, they both de-optimize by using lzcnt and then an actual xor eax,31. (That asm is still better for AMD, where bsf is significantly slower than tzcnt/lzcnt, so it's good for -mtune=generic -mbmi, but not for -march=haswell)