Search code examples
ieee-754instruction-setfma

Difference between FMA and naive a*b+c?


In the BSD Library Functions Manual of FMA(3), it says "These functions compute x * y + z."

So what's the difference between FMA and a naive code which does x * y + z? And why FMA has a better performance in most cases?


Solution

  • a*b+c produces a result as if the computation were:

    • Calculate the infinitely precise product of a and b.
    • Round that product to the floating-point format being used.
    • Calculate the infinitely precise sum of that result and c.
    • Round that sum to the floating-point format being used.

    fma(a, b, c) produces a result as if the computation were:

    • Calculate the infinitely precise product of a and b.
    • Calculate the infinitely precise sum of that product and c.
    • Round that sum to the floating-point format being used.

    So it skips the step of rounding the intermediate product to the floating-pint format.

    On a processor with an FMA instruction, a fused multiply-add may be faster because it is one floating-point instruction instead of two, and hardware engineers can often design the processor to do it efficiently. On a processor without an FMA instruction, a fused multiply-add may be slower because the software has to use extra instructions to maintain the information necessary to get the required result.