Search code examples
c++divisionmodulofloating-accuracy

Decomposing floating-point division into integer and fractional part


I'm trying to do integer division + modulo using doubles (for spline-based interpolation), but I've encountered some issues relating to floating-point precision when using std::floor and std::fmod.

I've been using the equivalent of div1 below, but at 50 it produces incorrect results (that is, the integer part is 3, but the modulo part is the divisor minus epsilon). div2 works but is rather convoluted. div3 is at least consistent, but does not return the type of result I would like (the remainder may be negative, so it would need futher manipulation before I could use it).

#include <iostream>
#include <cmath>

std::pair<double, double> div1(int num, double denom){
    double whole = std::floor(num / denom);
    double remain = std::fmod(num, denom);
    return {whole, remain};
}
std::pair<double, double> div2(int num, double denom){
    double floatdiv = num / denom;
    double whole;
    double remain = std::modf(floatdiv, &whole);
    return {whole, remain * denom};
}
std::pair<double, double> div3(int num, double denom){
    double whole = std::round(num / denom);
    double remain = std::remainder(num, denom);
    return {whole, remain};
}
int main() {
    double denom = 100.0 / 6;
    int divtype = 0;
    for(auto div: {div1, div2, div3}){
        std::cerr << "== Using div" << ++divtype << " for this run ==\n";
        for(int i = 40; i <= 60; ++i){
            auto res = div(i, denom);
            std::cerr << i << ": " << res.first << ", " << res.second << " = " << res.first * denom + res.second << "\n";
        }
        auto oldprec = std::cerr.precision(64);
        auto res = div(50, denom);
        std::cerr << 50 << ": " << res.first << ", " << res.second << " = " << res.first << " * " << denom << " + " << res.second << " = " << std::floor(res.first) * denom + res.second << "\n";
        std::cerr.precision(oldprec);
        std::cerr << "\n";
    }
    return 0;
}

https://ideone.com/9UbHcE

For the case of 50, the following results are produced:

 - div1: 3, 16.6666... 
 - div2: 3, 0
 - div3: 3, -3e-15

Am I doing something wrong, or is std::floor(num / denom) + std::fmod(num, denom) not reliable? If so, what is a good replacement? Is div2 the best option?

Version of code example with most answers included: https://ideone.com/l2wGRj


Solution

  • Your core problem is that denom = 100.0/6 is not the same as the mathematically exact value denomMath = 100/6 = 50/3, because it can't be represented as a sum of powers of two. We can write denom = denomMath + eps (with a small positive or negative epsilon). After assigning it, denom is indistinguishable from the closest floating point number! If you now try to divide some value denomMath * k = denom * k + eps * k by denom, for a sufficiently large k you will get the wrong result mathematically (i.e. in exact arithmetic) already - you have no hope in this case. How soon this happens depends on the magnitudes involved (if the values are < 1 then all your div will yield whole parts of zero and be exact, whereas for values larger than 2^54 you cannot even represent odd numbers).

    But even before that, there is no guarantee that dividing a (mathematical) multiple of denomMath by denom yields something that can be floored or fmoded to the right whole number. Rounding might keep you safe for a while, but as shown above, only as long as the errors don't get too large.

    So:

    • div1 runs into the problem described here: https://en.cppreference.com/w/cpp/numeric/math/fmod

      The expression x - trunc(x/y)*y may not equal fmod(x,y) when the rounding of x/y to initialize the argument of trunc loses too much precision (example: x = 30.508474576271183309, y = 6.1016949152542370172)

      In your case, 50 / denom yields a number that is slightly too large (3) compared to the exact result (3 - some epsilon because denom is slightly larger than denomMath)

      You cannot rely on std::floor(num / denom) + std::fmod(num, denom) to equal num.

    • div2 has the problem described above: In your case it works, but if you try more cases you'll find one where num / denom is slightly too small instead of too large and it will fail too.

    • div3 has promise as mentioned above. It actually gives you the most precise result you could hope for.