While there are many answers to the question of how to find the next power of two of an integer or floating-point number, there aren't as many for finding the nearest power of two of a number.
I already implemented the following :
template <typename T>
static constexpr T round_pow2(T v) {
if constexpr (std::is_floating_point_v<T>) {
auto high = static_cast<unsigned long>(std::ceil(v));
auto low = static_cast<unsigned long>(std::floor(v));
if (high == low) {
return round_pow2<unsigned long>(high);
} else {
T a = static_cast<T>(round_pow2<unsigned long>(low));
T b = static_cast<T>(round_pow2<unsigned long>(high));
return std::abs(a - v) <= std::abs(b - v) ? a : b;
}
} else {
T high = v - 1;
for (T i = 1; i < static_cast<T>(sizeof(T)); i *= 2) {
high |= high >> i;
}
high += 1;
T low = high >> 1;
return (high - v) < (v - low) ? high : low;
}
}
Which should work for any non negative and lower than ULLONG_MAX fp value. But this doesn't seem to be optimal. Is there a better (more performant) way of implementing this function ?
Edits: @Eric for my application getting 0 is fine in case v == 0. But this could indeed be a problem to some because 0 is not a power of 2.
@Blixodus Thank you for your answer, it pointed me in the right direction. I created the following function with your idea in mind:
template <typename T>
constexpr T round_p2(T v) {
if constexpr (std::is_floating_point_v<T>) {
using R = std::conditional_t<
std::is_same_v<T, double>, uint64_t,
std::conditional_t<std::is_same_v<T, float>, uint32_t,
void
>>;
auto [mlen, es, em] = std::is_same_v<T, double> ? std::make_tuple(52, 1024, 0x7FF) : std::make_tuple(23, 128, 0xFF);
auto y = *reinterpret_cast<R*>(&v);
return (T(y >> (sizeof(R) * 8 - 1)) * -2 + 1) * (2 << (((y >> mlen) & em) - es + ((y >> mlen - 1) & 0x1)));
} else {
using R = std::make_unsigned_t<T>;
R rv = static_cast<R>(v);
T sign = 1;
if constexpr (std::is_signed_v<T>) {
if (v < 0) {
rv = static_cast<R>(-v);
sign = -1;
}
}
R high = rv - 1;
for (R i = 1; i < static_cast<R>(sizeof(R)); i *= 2) {
high |= high >> i;
}
high += 1;
R low = high >> 1;
return sign * static_cast<T>((high - rv) <= (rv - low) ? high : low);
}
}
It seems to works quite well for my application and generates really nice assembly compared to my first implementation.
Explanation: I am first getting three magic values depending on if v is a float or a double: the first is the length of the mantissa, the second is the amount the exponent need to be decremented by (plus one), and the third is the mask used to extract the exponent from the fp representation.
I then cast the fp value into an unsigned integer of the same size to be able to work on it.
Next, I extract the value which will sign the final result using (T(y >> (sizeof(R) * 8 - 1)) * -2 + 1)
. This extracts the highest bit of the fp value (the sign, 0 for positive and 1 for negative), and then apply the function f(x) = x * -2 + 1
to it, which gives 1
for x=0
and -1
for x=1
.
Eventually, I compute the closest unsigned power of two of the given fp value using Blixodus' formula. Because I don't use the std::pow
function in favor of a bit shift (because we're working with powers of two). I need to account for this by removing one to the value we're shifting 2 with (hence the value of es
being 1 more than expected).
Prior to C++ 2020, there is no implementation for rounding integers to a power of two that does not use looping, a compiler extension, or bit-shifting that presumes some limit on the width of the type. There are several Stack Overflow questions about that. The code below shows a solution using the C++ std::countl_zero
and an alternate solution using GCC’s count-leading-zeros builtin, which will work for any integer type up to unsigned long long
.
#include <cmath>
#if 201703L <= __cplusplus
#include <bit>
#include <type_traits>
#endif
template <typename T> static constexpr T round_pow2(T v)
{
/* Since one is the smallest power of two, all numbers less than or equal
to one round to one.
*/
if (v <= 1) return 1;
/* For floating-point, the standard frexp function gives us the fraction
and exponent, and ldexp applies an exponent. The fraction is scaled to
[.5, 1), so, if it is less than or equal to .75, we round down.
*/
if constexpr (std::is_floating_point_v<T>)
{
int exponent = 0;
T fraction = frexp(v, &exponent);
return ldexp(.5, exponent + (.75 < fraction));
}
/* Here we handle integer types. The midpoints for rounding to powers of
two are at 3*2^n. That is, the transition between rounding to one
power of two and another occurs at a number that has the form 3*2^n.
To find which interval v is in, we can divide it by three and then
find the next lower (instead of nearest) power of two. To get the
desired rounding at the midpoint, we use v-1. So the general algorithm
is to round (v-1)/3 down to the nearest power of two, then quadruple
that. For example:
v v-1 (v-1)/3 rounded down quadrupled
11 10 3 2 8
12 11 3 2 8
13 12 4 4 16
Note that (v-1)/3 is not quite right for v=2, as the subtraction of 1
jumps a full power of two, from 2 to 1. (v-.01)/3 would work, but we
want to stick with integer arithmetic.
For the general case, we want 4 * 2**floor(log2((v-1)/3)). To include
v=2, we will use 2 * 2**f((v-1)/3), where f(x) is floor(log2(x))+1 but
clamped to produce at least zero.
If the C++ 2020 std::countl_zero function is available, we use that.
Otherwise, we use the GCC builtin __builtin_clzll. In either case, the
function returns the number of leading zero bits, which depends on the
width of the type rather than the operand value alone. To calculate
the power of two, we get the bit count for a fixed value (zero or one)
as a reference point.
*/
else
{
#if __cpp_lib_bitops
/* std::countl_zero is only provided for unsigned types, so
define UT to be the unsigned type corresponding to T.
*/
using UT = std::make_unsigned<T>::type;
return static_cast<UT>(2) << std::countl_zero<UT>(0) - std::countl_zero<UT>((v-1)/3));
#else
/* Since __builtin_clzll is not defined for zero operands, we need
to ensure its operand is at least 1. To do this, we change
(v-1)/3 to (v-1)/3*2+1. The doubling increases the power of
two by one, so we change the reference point from zero to one,
decreasing the number of bits for it by one.
*/
return 2ull << __builtin_clzll(1) - __builtin_clzll((v-1)/3*2+1);
#endif
}
}