Search code examples
floating-pointprecision

Largest number a floating point number can hold while still retaining a certain amount of decimal precision


I would like to know the largest positive number a 32 bit float can hold while still being able to represent approximately 1/1000 decimal resolution.

So, for example if the float represents kilo Watts, how big can the kilo Watt number get before I would lose the ability to convert it to Watts without significant loss of precision (say a few Watts).


Solution

  • I assume that you want the distance between two consecutive Float to be less than 1/1000 to have a precision of 1 watt or better.

    This is related to the unit of least precision (ulp) of the Float.

    In binary formats, the float magnitude has a general form 1.fractionBits * 2^exponent

    If the float has a precision p,

    • its significand, composed by the leading one and the fraction bits, has p bits.
    • there are p-1 fraction bits,
    • the leading 1 represent a quantity 2^exp
    • the first fraction bit a quantity 2^(exp-1)
    • the last fraction bit a quantity 2^(exp-(p-1)) this is the ulp of the float

    Now the requirement is ulp < 1/1000. That is 2^(exp+1-p) < 1/1000.

    If we enforce the requirement a little, ulp <= 1/1024, that is 2^-10:

     exp+1-p <= -10
    

    So the float exponent must be

    exp <= p-11
    

    For IEEE 754

    • single precision, p=24, exp<=13, the float magnitude must be < 2^14, about 16384.0.
    • double precision, p=53, exp<=42, the float magnitude must be < 2^43, that is about 8 * 10^12 approximately

    Now, if you want a precision of a few watt, just do the arithmetic. 2 watts make the limit twice higher, 4 watts double the limit again, 8 watts etc...

    We can generalize the formulation : if you want a precision of 10^-n, that is 2^(log(10^-n)/log(2)), or 2^(-n*log2(10)).

    Thus the exponent must be exp <= p - 1 -n*log2(10).

    The limit is then abs(float) < 2^(exp+1), that is abs(float)<2^(p-ceil(n*log2(10))).