python c++c bit-manipulation bitwise-operators

Bit manipulation to convert leftmost set bits to right-side alternating bits?

I'm trying to maximally optimize a low-level subroutine, but I can't figure out the fastest way to flip the bits in this specific case:

Given a binary integer n wherein all set bits are to the left of all clear bits (e.g. 11110000, 110000, 1110000), is it possible to produce a resulting binary integer of digit length ((number of set bits in n) - 2) * 2, with all even bits set and all odd bits clear?

Example:

n = 111000, answer: 10
n = 1111000, answer: 1010
n = 110, answer: 0
n = 111110000000, answer: 101010
n = 1111111111000000000, answer: 1010101010101010

n is guaranteed to have at least 2 set bits, and at least (set bits - 1) clear bits

The answer must utilize only a constant number of bit manipulation and/or arithmetic operations (i.e. loopless), and can't use any type conversions (integers only, no strings).

Solution

One approach could be to use these steps:

Right-justify (discard trailing zeroes)
Get rid of 2 set bits
Double the number of set bits
Mask out the even bits

For example, using only "basic" operations:

// right-justify
x = x / (x & -x)
// get rid of 2 set bits
x >>= 2
// double the number of set bits
x *= x + 2
// mask out even bits
x &= 0xAAAAAAAAAAAAAAAA

That step "double the number of set bits" relies on x being a power of two minus one at that point. If x can be written as 2^k-1 then x * (x + 2) will be (2^k-1) * (2^k+1) = 2^2k-1 so it doubles the number of set bits.

Division is not so nice, if you had a fast tzcnt then you can right-justify with:

x >>= tzcnt(x)

With fast pdep (Intel Haswell and newer, works on AMD Ryzen but slowly), doubling the number of set bits can be avoided,

// spread out the bits to even positions
x = pdep(x, 0xAAAAAAAAAAAAAAAA)

And fast pext could be used as an alternative right-justify,

// right-justify
x = pext(x, x)

With the common popcnt a more direct approach could be used, counting the number of set bits, subtracting two, then generating a pattern of that size for example by shifting 0xAAAAAAAAAAAAAAAA right until it is short enough, or by using bzhi to truncate it at the top.