Search code examples
javac++floating-pointdoubleprecision

Rounding when ULP 1 - Java and C++


The reason of the question originated from differences in the results that I observed when computing std::exp(x) in C++ against Math.exp(x) in Java (with x of type double). I noticed that the algorithm used for computing the exp() are different between the two languages. However, I believe that they both ensure that the rounding error is ULP 1.

So the question is the following: how does the rounding happen when ULP is set to 1? How does it decide if rounding up or down? and "who" does the choice?

I would like to obtain the same exact results of Java when computing the exp(x) in C++. If I replicate in C++ the exact algorithm in that Java uses within Math.exp() (which actually delegate to fdlibm), will I obtain identical results between the two langauges? Or a consequence of the ULP 1 is that the rounding up or down is arbitrary and therefore it does not depend on the specific algorithm?

Note: my understanding of ULP 1 is that if the results of a mathematical function is, let's assume, f(x) = 0.55 and the floating point numbers closes to it are 0.52 and 0.56, then (even if 0.56 would be the optimal choice) the results could be either 0.52 or 0.56.

Further side question: I also observed that switching in C++ between x86 and x64, does change the results of exp(x), aligning it sometimes with the results given by Java. Why does the results of exp(x) changed in C++ between x86 and x64 built? Compiler is MSVC 14.16.27023

Update: I have used the implementation of fdlibm for exp() (code provided in https://www.netlib.org/fdlibm/e_exp.c) and I do actually get a result which is different both from std::exp() of C++ and Math.exp() of Java. The specific numbers that I'm trying are: exp(-0.271153846153846134203746487401076592504978179931640625) exp(-0.03076923076923077093880465326947160065174102783203125)


Solution

  • Note: my understanding of ULP 1 is that if the results of a mathematical function is, let's assume, f(x) = 0.55 and the floating point numbers closes to it are 0.52 and 0.56, then (even if 0.56 would be the optimal choice) the results could be either 0.52 or 0.56.

    This is not correct. The C++ standard does not specify how accurately the mathematical routines in the standard library must be calculated, and implementations vary. And note the plural: There are multiple implementations. Your question refers to “std::exp(x) in C++” as if it were a single thing. It is not. There are different implementations of exp used in different implementations of C++. A single C++ compiler may even be linked with different choices of libraries containing implementations of std::exp.

    I am less familiar with the specifications of the Java standard in this regard, although I know that its StrictMath class is defined with respect to fdlibm version 5.3, so it should give reproducible results across different implementations.

    The property you describe, where the result computed for f(x) is one of the two floating-point numbers immediately bounding the mathematical f(x), is called faithful rounding. (There may be slight variations in the definition of “faithful rounding” used in various places.)

    It is common that implementations of math routines are not faithfully rounded. Errors of multiple ULP may be common, and greater errors may exist.

    It is unlikely you can get the same results in Java and C++ except by using a fdlibm version 5.3 implementation with C++.

    For accuracy in math routines, the best goal is correct rounding, which means the computed result for f(x) equals the result of rounding the mathematical f(x) to the nearest value representable in the floating-point format using a specified rounding rule. (The most common rounding rule is round-to-nearest, ties-to-even. Other rounding rules include round upward, round downward, and round toward zero.) Correct rounding is generally hard, because a mathematical function may have points arbitrarily close to a point where rounding switches, so there is no finite limit on how accurately a function may have to be computed to determine which side of the rounding point the value falls on. Each function has to be considered individually.

    For the format commonly used for double, IEEE-754 binary64, humans do not yet know whether all of the math library routines can be computed with correct rounding in a known bounded time. The Core-MATH project is working on this and has produced correctly rounded implementations of many of the math library routines, but not yet all.

    exp is one of the easier functions, so it is done; a correctly rounded implementation is available. However, you can expect a correctly rounded implementations to differ from the fdlibm routine used by Java. So conformance between languages and prior implementations is actually an impediment to improving the math library.

    If I replicate in C++ the exact algorithm in that Java uses within Math.exp() (which actually delegate to fdlibm), will I obtain identical results between the two [languages]?

    That can be done, but it can be complicated. Suppose fdlibm has some code y = (((a*x + b)*x + c)*x + d;. You can reproduce that code in C++. However, the C++ standard does not require that expression to be evaluated using double arithmetic even if all the variables are double. It allows the C++ implementation to use extra precision. If you wish to reproduce fdlibm results exactly in C++, you will need to learn about floating-point evaluation in C++ and tightly control it.

    Additional Notes

    The text “how does the rounding happen when ULP?” appears to be ungrammatical. “ULP” is not a verb, and I do not know what you mean by this.

    How does it decide if rounding up or down? and "who" does the choice?

    A mathematical software engineer analyzes the function, the hardware and software available to them, and crafts an implementation. Implementations of math library routines commonly include code for argument reduction and evaluation of a polynomial. The polynomial is often a minimax polynomial. The engineer makes decisions about the precision with which to perform the argument reduction and the polynomial evaluation, the degree to use for the polynomial, endpoints of the polynomial in case they split the function domain into subintervals, and so on. In the end, they end up with code that approximates the function f. Whether rounding up or down occurs for any particular input is a happenstance of the behavior of the approximation at that point.