I intend to use half-precision floating-point in my code but I am not able to figure out how to declare them. For Example, I want to do something like the following-
fp16 a_fp16;
bfloat a_bfloat;
However, the compiler does not seem to know these types (fp16
and bfloat
are just dummy types, for demonstration purposes)
I remember reading that bfloat
support was added into GCC-10, but I am not able to find it in the manual.I am especially interested in bfloat
floating numbers
Additional Questions -
fp16
are internally converted to fp32 first, and only affect cache footprint and bandwidth.bfloat
(I am aware of intrinsics like _mm256_mul_ph
, but not sure how to pass the 16bit FP datatype, would really appreciate if someone could highlight this too)PS - Related Post - Half-precision floating-point arithmetic on Intel chips , but it does not cover on declaring half precision floating point numbers.
TIA
Neither C++ nor C language has arithmetic types for half floats.
The GCC compiler supports half floats as a language extension. Quote from the documentation:
On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) floating point via the _Float16 type. For C++, x86 provides a builtin type named _Float16 which contains same data format as C.
...
On x86 targets with SSE2 enabled, without -mavx512fp16, all operations will be emulated by software emulation and the float instructions. The default behavior for FLT_EVAL_METHOD is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 instructions. Using -fexcess-precision=16 will force round back after each operation.
Using -mavx512fp16 will generate AVX512-FP16 instructions instead of software emulation. The default behavior of FLT_EVAL_METHOD is to round after each operation. The same is true with -fexcess-precision=standard and -mfpmath=sse. If there is no -mfpmath=sse, -fexcess-precision=standard alone does the same thing as before, It is useful for code that does not have _Float16 and runs on the x87 FPU.