Search code examples
floating-pointprecisionlimitminieee-754

Minimum / Maximum numbers that can be represented in floating point


How do I calculate the min/max decimal numbers that could be represented in binary 16, 32, 64 IEEE 754 floating point?


Solution

  • The NORMAL ranges are:

    • 16-bit (half precision): ±6.10e-5 to ±65504.0
    • 32-bit (single precision): ±1.18e−38 to ±3.4e38
    • 64-bit (double precision): ±2.23e−308 to ±1.80e308

    If you allow for DENORMALS as well, then minumum values are:

    • 16-bit: ±5.96e-8
    • 32-bit: ±1e-45
    • 64-bit: ±5e-324

    Always keep in mind that just because a number is in this range doesn't mean it can be exactly represented. At any range, floating-point numbers necessarily skip values due to cardinality reasons. The classic example is 1/3 which has no exact representation in any finite precision, for binary or decimal formats. In general you can only precisely represent those numbers that are called "dyadic" for the binary format, i.e., those of the form A/2^B for some A and B; provided the result falls into the dynamic range.