c gcc arm arm64 fpu

Retain bit-exact floating point calculations on AArch64 with O2

I'm comparing outputs of signal processing library using floating-point math, which was built for AArch64 (ARMv8) using e.g. gcc 4.9.

Differences occur depending on the optimization level. Unoptimized builds (O0) calculate bit-exact results with respect to an ARMv7 reference. In ARMv7 environments 'O2' builds did not introduce deviations in the floating point calculations. This is not the case for ARMv8. Optimized builds actually calculate a different result.

Are compiler switches available to retain bit-exactness to non-optimized builds?

Tests have been performed on a DragonBoard 410c (Cortex-A53).

Solution

Depending on your options to your ARMv7-A builds (If you were using -mfpu=vfpv4 or equivalent, this answer is probably wrong) the most likely difference you are seeing will be the generation of FMA operations.

To avoid this, use -ffp-contract=off. The GCC documentation for this option says:

-ffp-contract=style

-ffp-contract=off disables floating-point expression contraction.

-ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them.

-ffp-contract=on enables floating-point expression contraction if allowed by the language standard. This is currently not implemented and treated equal to -ffp-contract=off.

The default is -ffp-contract=fast.