I'm trying to maximally optimize a low-level subroutine, but I can't figure out the fastest way to flip the bits in this specific case:
Given a binary integer n
wherein all set bits are to the left of all clear bits (e.g. 11110000
, 110000
, 1110000
), is it possible to produce a resulting binary integer of digit length ((number of set bits in n) - 2) * 2
, with all even bits set and all odd bits clear?
Example:
n = 111000, answer: 10
n = 1111000, answer: 1010
n = 110, answer: 0
n = 111110000000, answer: 101010
n = 1111111111000000000, answer: 1010101010101010
n
is guaranteed to have at least 2 set bits
, and at least (set bits
- 1) clear bits
The answer must utilize only a constant number of bit manipulation and/or arithmetic operations (i.e. loopless), and can't use any type conversions (integers only, no strings).
One approach could be to use these steps:
For example, using only "basic" operations:
// right-justify
x = x / (x & -x)
// get rid of 2 set bits
x >>= 2
// double the number of set bits
x *= x + 2
// mask out even bits
x &= 0xAAAAAAAAAAAAAAAA
That step "double the number of set bits" relies on x
being a power of two minus one at that point. If x
can be written as 2k-1 then x * (x + 2)
will be (2k-1) * (2k+1) = 22k-1 so it doubles the number of set bits.
Division is not so nice, if you had a fast tzcnt
then you can right-justify with:
x >>= tzcnt(x)
With fast pdep
(Intel Haswell and newer, works on AMD Ryzen but slowly), doubling the number of set bits can be avoided,
// spread out the bits to even positions
x = pdep(x, 0xAAAAAAAAAAAAAAAA)
And fast pext
could be used as an alternative right-justify,
// right-justify
x = pext(x, x)
With the common popcnt
a more direct approach could be used, counting the number of set bits, subtracting two, then generating a pattern of that size for example by shifting 0xAAAAAAAAAAAAAAAA
right until it is short enough, or by using bzhi
to truncate it at the top.