I disassembled an arm binary previously compiled with neon flags:
-mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize
The dump shows a vdiv.f64 instruction generated by the compiler. According to the arm manual for armv7 (cortex-a9) neon simd isa does not support vdiv instruction but the floating point (vfp) engine does. Why is this instruction generated? is it then a floating point instruction that will be executed by the vfp? Both neon and VFP support addition and multiplication for floating point so how can I differenciate them from eahc other?
In the case of Cortex-A9, the NEON FPU option also implements VFP; it is a superset of the cut-down 16-register VFP-only FPU option.
More generally, the architecture does not allow implementing floating-point Advanced SIMD without also implementing at least single-precision VFP, therefore GCC's -mfpu=neon
implies VFPv3 as well. It is permissible to implement integer-only Advanced SIMD without any floating-point capability at all, but I'm not sure GCC can support that (or that anyone's ever built such a thing).
The actual VFP and Advanced SIMD variants of instructions are unambiguous from the syntax - anything operating on double-precision data (i.e. <op>.F64
) is obviously VFP, as Advanced SIMD doesn't support double-precision. Single precision operations (i.e. <op>.F32
) operating on 32-bit s
registers are scalar, thus VFP; if they're operating on larger 64-bit d
or 128-bit q
registers, then they are handling multiple 32-bit values at once, thus are vectorised Advanced SIMD instructions.