Search code examples
gccarmcompiler-optimizationcortex-m

arm-none-eabi-gcc not inferring floating point multiply-accumulate from code


The ARM fpv5 instruction set supports double precision floating point operations, including single cycle multiply accumulate instructions (VMLA/VMLS) as detailed in their ISA documentation.

Unfortunately, I can't get my code to use this instruction from within any C application.

Here is a simple example:

float64_t a=0, b=0, c=0;

while(1)
{
        b += 1.643;
        c += 3.901;
        a += b * c; // multiply accumulate???

        do_stuff(a) // use the MAC result

}

The code above generates the following assembly for (what I believe should be) a MAC operation

170               a += b * c;
00000efe:   vldr    d6, [r7, #64]   ; 0x40
00000f02:   vldr    d7, [r7, #56]   ; 0x38
00000f06:   vmul.f64        d7, d6, d7
00000f0a:   vldr    d6, [r7, #72]   ; 0x48
00000f0e:   vadd.f64        d7, d6, d7
00000f12:   vstr    d7, [r7, #72]   ; 0x48

As you can see, it does the multiply and the addition step separately. Is there a good reason why the compiler cant use the VMLA.f64 instruction here?

  • Target: ARM Cortex M7 (NXP iMXRT1051)
  • Toolchain: arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2018-q4-major) 8.2.1 20181213 (release) [gcc-8-branch revision 267074]

Solution

  • Solved. It was the optimization level. When set to -O3 the instructions changed to properly use the MAC.

    I thought that taking advantage of hardware acceleration (e.g. FPU) wouldn't be dependent on the optimization level since its essentially "free", but I guess I was wrong.