Difference between FMA and naive a*b+c?

In the BSD Library Functions Manual of FMA(3), it says "These functions compute x * y + z."

So what's the difference between FMA and a naive code which does x * y + z? And why FMA has a better performance in most cases?

Solution

a*b+c produces a result as if the computation were:

Calculate the infinitely precise product of a and b.
Round that product to the floating-point format being used.
Calculate the infinitely precise sum of that result and c.
Round that sum to the floating-point format being used.

fma(a, b, c) produces a result as if the computation were:

Calculate the infinitely precise product of a and b.
Calculate the infinitely precise sum of that product and c.
Round that sum to the floating-point format being used.

So it skips the step of rounding the intermediate product to the floating-pint format.

On a processor with an FMA instruction, a fused multiply-add may be faster because it is one floating-point instruction instead of two, and hardware engineers can often design the processor to do it efficiently. On a processor without an FMA instruction, a fused multiply-add may be slower because the software has to use extra instructions to maintain the information necessary to get the required result.