Search code examples
mathfloating-pointieee-754

What is theoretically the smallest floating point format possible?


Assuming you are considering IEEE-754 format for floating point numbers for things like single and double precision, what is the smallest floating point format you could possibly have?

I know there are half-floats, and miniflooats, but how small still makes sense? I know the applications might not be there to make the format great for any practical use however.

I'm trying to determine what is the smallest mantissa bitwidth you could have and smallest exponent width?

For instance, does it make sense to have a mantissa that is in X.X format (assuming single precision would be represented as X.XXXXXXXXXXXXXXXXXXXXXXX)? Also, does it make sense to have an exponent with width 1?

As an example of what I'm thinking:

If you had X.X format, and no exponent, then your only possible numbers are +/- {1.0,1.1}, but is there something fundamental about floating point numbers or format that makes these impossible to consider?


Solution

  • I have occasionally used a four-bit FP format: 2 exponent bits and 1 significand bits. This gives you the following set of values:

    encoding    value
      x000     +/-0.0
      x001     +/-0.5
      x010     +/-1.0
      x011     +/-1.5
      x100     +/-2.0
      x101     +/-3.0
      x110     +/-Inf
      x111        NaN
    

    Obviously, you can't do much useful computation with this format, but it's useful for teaching because it's the smallest format that gives you all of the interesting edge cases (no signaling NaN, though, if you care about that, unless you want to make "-NaN" signaling).

    In some sense, this is the "smallest" floating-point format that isn't totally degenerate, but you'd still never use it because it's worse in basically every way than a 4-bit signed fixed-point format with one fractional bit. The smallest floating-point format that really passes this test in a general setting is half precision (though there are some niche uses for 8b formats).

    The three-bit format with no significand bits almost works; it gives you +/-0, +/-1, +/-2, and +/-Inf, but there's no NaN encoding available if you follow the usual IEEE-754 encoding rules. It would be nicer to use b010 for Inf and b011 for NaN, but then no rounding ever occurs in arithmetic (except for 1 + 1 overflowing), which isn't very useful for teaching.