Search code examples
cprecisionlibclibmgcc6

Wrong result for atan2 with glibc / libm and float32


I am currently developing the firmware for a medical device where a lot of difficult mathematical operations is involved. The target processor supports floating point operation in hardware, but only float32 (aka single).

To simulate the behavior and prove the correctness of my formulae and code, I have ported the relevant / mathematical part of the firmware to the GCC toolchain in Linux (gcc 6.3.0, libc6 2.24), double-checking that float32 is used everywhere and that no compiler switch is used which could reduce the precision or standards compatibility of mathematical operations; notably, there is none of -ffast-math or its friends.

Now, it has turned out that I am getting unexpected results for a small set of input parameters. I have tracked down the problem and have come to the conclusion that libm calculates a wrong result for arctan (to be precise: atan2) for a very small set of input parameters.

For example, if I have

#include <math.h>

#define C_RAD2DEG (57.29577951308f)

int main(void)
{
  float f_Temp = C_RAD2DEG * atan2f(0.713114202f, 0.665558934f);
}

f_Temp is computed to be 46.9755516f, where the right result would be 46.975548972f.

Please note that I am generally aware about the issues with different floating point data types, rounding errors and the like.

However, my feeling is that the error shown above is too high by an order of magnitude even given the low precision of float32, and unfortunately, for the calculations which follow, that error is too much.

Furthermore, only a very small subset of possible input parameters to the atan2 function is affected by the problem.

Could anybody please shortly explain if this is a bug in libm or if it is just due to the imprecision of float32 and the great number of sequential operations needed to compute atan2?


Solution

  • The number you report as the observed result, 46.9755516f, corresponds to the float value 46.975551605224609375.

    The number you report as the expected result, 46.975548972f, corresponds to the float value 46.97554779052734375.

    These are adjacent float values, meaning they differ by 1 unit of least precision (ULP). (Their difference is 3.814697265625e-06, which is the value of the least significant bit in the float significand when the most significant bit has value 32, as it does for numbers around 47.) This is the smallest possible amount by which a float can change at that scale.

    Generally, the math library routines are difficult to implement, and nobody has implemented all of them with correct rounding (rounding to the representable number that is nearest the exact mathematical value) and known bounded run time. A few ULP of error is not unusual in the trigonometric routines.

    Even if the libc code you used provided a correctly rounded result, converting it from radians to degrees introduces two more rounding errors (converting 180/π to a representable value and multiplying by it). It is not reasonable to expect the final result to be the float that is nearest the ideal mathematical result; you should expect several ULP of error.