Search code examples
c++precisiondivisionnumerical-methodsnumerical-stability

Handling division of double by a large power of 2 in C++


My question is about how division should be handled when you have to divide with a large power of 2, which is likely to trivial thing but I didn't find any helpful material. I am basically asking which (if any) of the two proposed methods in the end is more sensible and/or is there a better way to achieve my goal.

The context is that in a project that I am working on I have to at one point compute a sum of type double numbers w of the form w = (c_1/2)*(c_2/2)*...*(c_n/2) with c_1,...,c_n some other type double numbers. In the process of optimizing the code, I at first figured that it might be a good idea to compute first the product of c_1,...,c_n, and then divide that product by 2^n. Due to reason which I cannot really get in this post, the n in the product defining w might get quite large, possibly even around 70 or 80. Right now I am effectively computing the final value of w i.) computing the aforementioned product of c_1,...,c_n and storing it to w, ii.) computing and storing the value of 2^n to a type long long variable a, iii.) set to w = w/a.

To my knowledge the largest power of 2 that can be stored in a type long long variable is 63, so I was wondering whether it would be better to divide w in a for loop, like

for (int k { 0 }; k < n; ++k) {
    w /= 2;
}

to avoid storing a value to large in the aforementioned variable a. I am aware that this can be reduced to a logarithmic time by repeatedly dividing with suitable powers of 2, but the point still remains.

Alternatively I could also just not factor out the 1/2s from the product, and instead compute w in the old way. However, in this approach it is not clear to me how much worse the numerical stability will be, or is there anything to be concerned about in the first place.


Solution

  • The answer was among the comments, but didn't get proper attention. I am not going to discuss the accuracy/precision concerns, because anyone dealing with floating point must already be aware and take responsibility for that part. The question at hand has a one liner answer as the library function ldexp:

    #inckude <ranges>
    #include <cmath>
    //assume C[N] is modeled by a C++ range:
    auto c = std::ranges::fold_left(c_range, 1.0L, std::multiplies<>{});
    auto n = - int{std::ranges::distance(c_range)};
    auto res = std::ldexp(c, n);
    // res = c * pow(2, n);
    

    The commented line might be less precise and slower than suggested answer.