Search code examples
floating-accuracy

Why implement lround specifically using integer math?


I noticed that the C++ standard library has separate functions for round and lround rather than just having you use long(round(x)) for the latter.

Looking into the implementation in glibc, I find that indeed, for platforms using IEEE754 floating point, the version that returns an integer will directly manipulate the bits from within the floating point representation, and not do the rounding using floating point operations (e.g. adding ±0.5).

What is the benefit of having a distinct implementation when you want the result as an integer type? Is this supposed to be faster, or more accurate? If it is better to use integer math on the underlying representation, why not just always do it that way even if returning the result as a double?


Solution

  • One reason is that adding .5 is insufficient. Let’s say you add .5 and then truncate to an integer. (How? Is there an instruction for that? Or are you doing more work?) If x is ½−2−54 (the greatest representable value less than ½), adding .5 yields 1, because the mathematical sum, 1−2−54, is exactly halfway between the nearest two representable values, 1−2−53 and 1, and the common default rounding mode, round-to-nearest-ties-to-even, rounds that to 1. But the correct result for lround(x) is 0.

    And, of course, lround is specified to round ties away from zero, regardless of the current rounding mode. You could set the rounding mode, do some arithmetic, and restore the rounding mode, but there are problems with this.

    One is that changing the rounding mode is a typically a time-consuming operation. The rounding mode is a global state that affects most floating-point instructions. So the processor has to ensure all pending instructions complete with the prior mode, change the global state, and ensure all later instructions start after that change.

    If you are lucky, you might have a processor with per-instruction rounding modes or something similar, and then you can use any rounding mode you like without time penalty. Hewlett Packard has some processors like that. However, “round away from zero” is an uncommon mode. Most processors have round-to-nearest-ties-to-even, round toward zero, round down (toward −∞), and round up (toward +∞), and round-to-odd is becoming popular for its value in avoiding double-rounding errors. But round away from zero is rare.

    Another reason is that doing floating-point instructions alters the floating-point status flags and may generate traps, but it is desired that library routines behave as single operations. For example, if we add .5 and rounding occurs, the inexact flag will be raised, since the floating-point addition with .5 produced a result different from the mathematical sum. But to the user of lround, no inexact condition ever occurs; lround is defined to return a value rounded to an integer, and it always does so—within the long range, it never returns a computed result different from its ideal mathematical definition. So if lround(x) raised the inexact flag, that would be incorrect behavior. To avoid it, an implementation that used floating-point instructions would have to save the current floating-point flags, do its work, and restore the flags before returning.