Search code examples
c++cx86intrinsicshalf-precision-float

Using Half Precision Floating Point on x86 CPUs


I intend to use half-precision floating-point in my code but I am not able to figure out how to declare them. For Example, I want to do something like the following-

fp16 a_fp16;
bfloat a_bfloat;

However, the compiler does not seem to know these types (fp16 and bfloat are just dummy types, for demonstration purposes)

I remember reading that bfloat support was added into GCC-10, but I am not able to find it in the manual.I am especially interested in bfloat floating numbers

Additional Questions -

  1. FP16 now has hardware support on Intel / AMD support as today? I think native hardware support was added since Ivy Bridge itself. (https://scicomp.stackexchange.com/questions/35187/is-half-precision-supported-by-modern-architecture)
  2. I wanted to confirm whether using FP16 will indeed increase FLOPs. I remember reading somewhere that all arithmetic operations on fp16 are internally converted to fp32 first, and only affect cache footprint and bandwidth.
  3. SIMD intrinsic support for half precision float, especially bfloat(I am aware of intrinsics like _mm256_mul_ph, but not sure how to pass the 16bit FP datatype, would really appreciate if someone could highlight this too)
  4. Are these types added to Intel Compilers as well ?

PS - Related Post - Half-precision floating-point arithmetic on Intel chips , but it does not cover on declaring half precision floating point numbers.

TIA


Solution

  • Neither C++ nor C language has arithmetic types for half floats.

    The GCC compiler supports half floats as a language extension. Quote from the documentation:

    On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) floating point via the _Float16 type. For C++, x86 provides a builtin type named _Float16 which contains same data format as C.

    ...

    On x86 targets with SSE2 enabled, without -mavx512fp16, all operations will be emulated by software emulation and the float instructions. The default behavior for FLT_EVAL_METHOD is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 instructions. Using -fexcess-precision=16 will force round back after each operation.

    Using -mavx512fp16 will generate AVX512-FP16 instructions instead of software emulation. The default behavior of FLT_EVAL_METHOD is to round after each operation. The same is true with -fexcess-precision=standard and -mfpmath=sse. If there is no -mfpmath=sse, -fexcess-precision=standard alone does the same thing as before, It is useful for code that does not have _Float16 and runs on the x87 FPU.