gcc options to use i87, AVX simultaneously but nor SSE

When compiled for processor that support AVX extension (say -m64 -march=corei7-avx -mtune=corei7-avx is applicable), does it make sense to use -mfpmath=both -mavx keys at the same time? Does not it so much that it causes the compiler to use three sets of instructions (i87, SSE, AVX) at the same time? Or just i87 for scalars (in some sense) and AVX for vectors only?

Solution

You normally don't want this; I don't think gcc is smart enough at deciding to use x87 when there's high register pressure to make it worth it.

x87 and SSE/AVX instructions compete for the same FP execution units on normal x86 CPUs (Intel and AMD), so you don't get more throughput from interleaving them.

Normally you should just use -mfpmath=sse (which means AVX when used with -mavx, or better -mfpmath=sse -march=native. The default for x86-64 is sse, so -mfpmath=sse only changes anything for -m32, AFAIK.

The main benefit of -mfpmath=both is more total registers, but managing the x87 register stack often costs extra instructions. Moving data between x87 and AVX also costs a store/reload (store-forwarding round trip, ~6 cycle latency on Haswell, http://agner.org/optimize/), so it's only really useful if you have two independent sets of calculations for the compiler to interleave. Otherwise it's not better than normal spill/reload.

Last time I looked at gcc -O3 -mfpmath=both, the results were not impressive: https://godbolt.org/g/p2KLEC shows gcc5.4 using some store/reloads to bounce data between x87 and xmm (AVX) registers. It would be better off just keeping some of the constants in memory.