Search code examples
pythonpandasfloating-accuracyrounding-error

correcting for floating point arithmetic 'errors' when rounding in pandas


I have a number that I have to deal with that I hate (and I am sure there are others).

It is

a17=0.0249999999999999
a18=0.02499999999999999

Case 1:

round(a17,2) gives 0.02
round(a18,2) gives 0.03

Case 2:

round(a17,3)=round(a18,3)=0.025

Case 3:

round(round(a17,3),2)=round(round(a18,3),2)=0.03

but when these numbers are in a data frame...

Case 4:

df=pd.DataFrame([a17,a18])

np.round(df.round(3),2)=[0.02, 0.02]

Why are the answers I get are the same as in Case 1?


Solution

  • When you are working with floats - you will be unable to get EXACT value, but only approximated in most cases. Because of the in-memory organization of floats.

    You should keep in mind, that when you print float - you always print approximated decimal!!!
    And this is not the same.

    Exact value will be only 17 digits after '.' in 0.xxxx

    That is why:

    >>> round(0.0249999999999999999,2)
    0.03
    >>> round(0.024999999999999999,2)
    0.02

    This is true for most of programming languages (Fortran, Python, C++ etc)

    Let us look into fragment of Python documentation:

    (https://docs.python.org/3/tutorial/floatingpoint.html)

    0.0001100110011001100110011001100110011001100110011...

    Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 ** 55 which is close to but not exactly equal to the true value of 1/10.

    Many users are not aware of the approximation because of the way values are displayed. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display

    >>>0.1
    0.1000000000000000055511151231257827021181583404541015625

    That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead

    >>>1 / 10
    0.1

    Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.

    Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.

    Let us look into fragment of NumPy documentation:

    (https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.around.html#numpy.around)
    For understanding - np.round uses np.around - see NumPy documentation

    For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. Results may also be surprising due to the inexact representation of decimal fractions in the IEEE floating point standard [R9] and errors introduced when scaling by powers of ten.

    Conclusions:

    In your case np.round just rounded 0.025 to 0.02 by rules described above (source - NumPy documentation)