Is it safe to assume that all machines on which AVX2 is supported also support F16C instructions? I haven't encountered any machine that doesn't do that, currently. Thanks
If you're doing runtime CPU dispatching, you should check for the F16C feature bit if you use it.
But I don't think any current real CPUs have AVX2 without it, and it's part of -march=x86-64-v3
which is a common Haswell-level baseline. You could encounter this feature combination in an emulator or VM that masked off some CPUID bits. That's possible but unlikely, and such a virtual machine might not run all software correctly if it makes assumptions about feature bits. Or it's remotely possible that some future CPU will have a bug in its F16C instructions and a microcode update will disable those but not AVX2. (This doesn't seem very plausible since they're just simple hardware execution units that aren't special or tricky.)
So you shouldn't manually vectorizing separate AVX2 versions with and without F16C. If someone disables F16C in their VM, it would be very reasonable for your program to just fall back to your SSE2 or SSE4 version, or whatever version you have that's not AVX2.
Of if you don't do runtime CPU detection and are wondering whether to build with -march=x86-64-v3
and use half-precision floats to save memory bandwidth in a program that uses AVX2, yeah go for it. Via's first AVX2 CPU lacks FMA, but it's pretty much only found in embedded systems and is very old at this point.