I'm trying to do integer division + modulo using doubles (for spline-based interpolation), but I've encountered some issues relating to floating-point precision when using std::floor
and std::fmod
.
I've been using the equivalent of div1
below, but at 50 it produces incorrect results (that is, the integer part is 3, but the modulo part is the divisor minus epsilon). div2
works but is rather convoluted. div3
is at least consistent, but does not return the type of result I would like (the remainder may be negative, so it would need futher manipulation before I could use it).
#include <iostream>
#include <cmath>
std::pair<double, double> div1(int num, double denom){
double whole = std::floor(num / denom);
double remain = std::fmod(num, denom);
return {whole, remain};
}
std::pair<double, double> div2(int num, double denom){
double floatdiv = num / denom;
double whole;
double remain = std::modf(floatdiv, &whole);
return {whole, remain * denom};
}
std::pair<double, double> div3(int num, double denom){
double whole = std::round(num / denom);
double remain = std::remainder(num, denom);
return {whole, remain};
}
int main() {
double denom = 100.0 / 6;
int divtype = 0;
for(auto div: {div1, div2, div3}){
std::cerr << "== Using div" << ++divtype << " for this run ==\n";
for(int i = 40; i <= 60; ++i){
auto res = div(i, denom);
std::cerr << i << ": " << res.first << ", " << res.second << " = " << res.first * denom + res.second << "\n";
}
auto oldprec = std::cerr.precision(64);
auto res = div(50, denom);
std::cerr << 50 << ": " << res.first << ", " << res.second << " = " << res.first << " * " << denom << " + " << res.second << " = " << std::floor(res.first) * denom + res.second << "\n";
std::cerr.precision(oldprec);
std::cerr << "\n";
}
return 0;
}
For the case of 50, the following results are produced:
- div1: 3, 16.6666...
- div2: 3, 0
- div3: 3, -3e-15
Am I doing something wrong, or is std::floor(num / denom)
+ std::fmod(num, denom)
not reliable? If so, what is a good replacement? Is div2
the best option?
Version of code example with most answers included: https://ideone.com/l2wGRj
Your core problem is that denom = 100.0/6
is not the same as the mathematically exact value denomMath = 100/6 = 50/3
, because it can't be represented as a sum of powers of two. We can write denom = denomMath + eps
(with a small positive or negative epsilon). After assigning it, denom
is indistinguishable from the closest floating point number! If you now try to divide some value denomMath * k = denom * k + eps * k
by denom
, for a sufficiently large k
you will get the wrong result mathematically (i.e. in exact arithmetic) already - you have no hope in this case. How soon this happens depends on the magnitudes involved (if the values are < 1 then all your div
will yield whole parts of zero and be exact, whereas for values larger than 2^54
you cannot even represent odd numbers).
But even before that, there is no guarantee that dividing a (mathematical) multiple of denomMath
by denom
yields something that can be floor
ed or fmod
ed to the right whole number. Rounding might keep you safe for a while, but as shown above, only as long as the errors don't get too large.
So:
div1
runs into the problem described here: https://en.cppreference.com/w/cpp/numeric/math/fmod
The expression
x - trunc(x/y)*y
may not equalfmod(x,y)
when the rounding ofx/y
to initialize the argument oftrunc
loses too much precision (example:x = 30.508474576271183309
,y = 6.1016949152542370172
)
In your case, 50 / denom
yields a number that is slightly too large (3
) compared to the exact result (3 - some epsilon
because denom
is slightly larger than denomMath
)
You cannot rely on std::floor(num / denom) + std::fmod(num, denom)
to equal num
.
div2
has the problem described above: In your case it works, but if you try more cases you'll find one where num / denom
is slightly too small instead of too large and it will fail too.
div3
has promise as mentioned above. It actually gives you the most precise result you could hope for.