A handy method to verify if a positive integer n
is a power of two (like 1, 2, 4, 8, etc.) is to use the following test for having no more than 1 bit set:
bool test = n & (n - 1) == 0;
This operation can be very efficient because it only involves subtraction, a bitwise AND and a conditional branch on the Zero Flag (ZF). If this expression is evaluated to true
, then the number n
is indeed a power of two.
Another method uses the std::popcount
(population count) function, which is part of the C++20 standard library, for the test:
bool test = std::popcount(n) == 1; // (Since C++20)
This function counts the number of set bits (1s) in. If the hardware supports a popcount instruction (POPCNT), this function can be very fast.
In C++, you generally “pay for what you use”. For this test there is no use for counting.
What is the better method, in terms of CPU efficiency?
You know your number is positive (which excludes zero) so you can indeed just use n & (n-1) == 0
without checking n != 0
. That's your most efficient option, potentially more efficient than C++20 std::has_single_bit
std::has_single_bit
needs to rule out the no-bits-set case. For that, it can be slightly more efficient on modern x86 to do popcount(n) == 1
if the compiler can assume support for a hardware popcnt
instruction, which is why std::has_single_bit
is often defined that way in C++ standard libraries.
But since you know your number is non-zero, the bithack is most efficient. Especially if compiling for a target where the compiler can't assume a hardware popcount (like x86 without -march=x86-64-v2
or newer), or AArch64 before ARMv9.x where scalar popcount requires copying to vector regs and back. RISC-V only has hardware popcount in an uncommon extension, not baseline.
On x86, it can be as cheap as
lea eax, [rdi-1]
test eax, edi
# boolean condition in ZF
jz or setz or cmovz or whatever
And similar on AArch64 with sub
and tst
. And pretty much any other modern RISC can subtract 1 while putting the result into a separate register, then AND those together cheaply.
And if you're compiling for -march=x86-64-v3
or later (BMI2 + AVX2 + FMA = Haswell feature set), the compiler can use BMI1 blsr eax, edi
to clear ("reset") the lowest set bit and set FLAGS accordingly, with ZF set according to the output being zero. CF is set if the input was zero so some branch conditions can check that n!=0
. But unfortunately conditions like jbe
are CF=1 or ZF=1
, ja
is CF=0 and ZF=0
. There isn't a single FLAGS condition that checks for ZF=1 and CF=0 which would let std::has_single_bit
compile to just blsr
plus a single branch, cmov, or setcc. ja
is taken if the input had multiple bits set, not-taken for power-of-2 or zero.
Unlike test
, blsr
can't macro-fuse with a later jcc
so it doesn't save any uops vs. lea
/test
if you're branching on it. It is better on Intel if the compiler is using it branchlessly, like for setnz
or cmovnz
. blsr
is a single uop on Intel, and on AMD Zen 4 and later. 2 uops on Zen 3 and earlier (https://uops.info/)
For use-cases where you can't assume a non-zero input and don't have cheap hardware popcount, there's an alternate bithack that's 3 operations instead of two: (n - 1) < (n ^ (n - 1))
The right hand side is what x86 BMI1 blsmsk
computes, but if we have blsmsk
we have popcnt
.
n-1
is a common subexpression so we only need to compute it once. For example RISC-V; AArch64 could be similar with cmp/bltu
addi x1, x0, -1 # n-1
xor x2, x1, x0 # n ^ (n-1)
bltu x1, x2, power_of_two # branch on a comparison
For zero, n-1
is UINT_MAX, and n ^ anything
is a no-op, so both sides are equal.
For a power of two, n-1
sets all the bits below where the set bit was, and XOR sets that bit again. So it's larger than n
, and also larger than n-1
.
For a non-power-of-two, n ^ (n-1)
is still just a mask up to and including the lowest set bit, with the high bits cancelled (the ones that n-1
didn't flip). So it's smaller than n-1
.
https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 also suggests v && !(v & (v - 1));
but I don't think that's better since logical &&
has to check both sides for non-zero.