Search code examples
c++floating-point16-bithalf-precision-float

Why is there no 2-byte float and does an implementation already exist?


Assuming I am really pressed for memory and want a smaller range (similar to short vs int). Shader languages already support half for a floating-point type with half the precision (not just convert back and forth for the value to be between -1 and 1, that is, return a float like this: shortComingIn / maxRangeOfShort). Is there an implementation that already exists for a 2-byte float?

I am also interested to know any (historical?) reasons as to why there is no 2-byte float.


Solution

  • Re: Implementations: Someone has apparently written half for C, which would (of course) work in C++: https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/cellperformance-snippets/half.c

    Re: Why is float four bytes: Probably because below that, their precision is so limited. In IEEE-754, a "half" only has 11 bits of significand precision, yielding about 3.311 decimal digits of precision (vs. 24 bits in a single yielding between 6 and 9 decimal digits of precision, or 53 bits in a double yielding between 15 and 17 decimal digits of precision).