Search code examples
pythonrnumpyprecisionieee-754

Multiplication of floating point numbers gives different results in Numpy and R


I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value.

Median: 
Numpy:14.948499999999999
R: 14.9632

Mean: 
Numpy: 13.097945407088607
R: 13.10936

Standard Deviation: 
Numpy: 7.3927612774052083
R: 7.390328

IQR: 
Numpy:12.358700000000002
R: 12.3468

Max and min of the data are the same on both platforms. I ran a quick test to better understand what is going on here.

  • Multiplying 1.2*1.2 in Numpy gives 1.4 (same with R).
  • Multiplying 1.22*1.22 gives 1.4884 in Numpy and the same with R.
  • However, multiplying 1.222*1.222 in Numpy gives 1.4932839999999998 which is clearly wrong! Doing the multiplication in R gives the correct answer of 1.49324.
  • Multiplying 1.2222*1.2222 in Numpy gives 1.4937728399999999 and 1.493773 in R. Once more, R is correct.

In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? Why are Numpy and R giving different results? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. How can I change Numpy to give me the "correct" answer?


Solution

  • Python

    The print statement/function in Python will print single-precision floats. Calculations will actually be done in the precision specified. Python/numpy uses double-precision float by default (at least on my 64-bit machine):

    import numpy
    
    single = numpy.float32(1.222) * numpy.float32(1.222)
    double = numpy.float64(1.222) * numpy.float64(1.222)
    pyfloat = 1.222 * 1.222
    
    print single, double, pyfloat
    # 1.49328 1.493284 1.493284
    
    print "%.16f, %.16f, %.16f"%(single, double, pyfloat)
    # 1.4932839870452881, 1.4932839999999998, 1.4932839999999998
    

    In an interactive Python/iPython shell, the shell prints double-precision results when printing the results of statements:

    >>> 1.222 * 1.222
    1.4932839999999998
    
    In [1]: 1.222 * 1.222
    Out[1]: 1.4932839999999998
    

    R

    It looks like R is doing the same as Python when using print and sprintf:

    print(1.222 * 1.222)
    # 1.493284
    
    sprintf("%.16f", 1.222 * 1.222)
    # "1.4932839999999998"
    

    In contrast to interactive Python shells, the interactive R shell also prints single-precision when printing the results of statements:

    > 1.222 * 1.222
    [1] 1.493284
    

    Differences between Python and R

    The differences in your results could result from using single-precision values in numpy. Calculations with a lot of additions/subtractions will ultimately make the problem surface:

    In [1]: import numpy
    
    In [2]: a = numpy.float32(1.222)
    
    In [3]: a*6
    Out[3]: 7.3320000171661377
    
    In [4]: a+a+a+a+a+a
    Out[4]: 7.3320003
    

    As suggested in the comments to your actual question, make sure to use double-precision floats in your numpy calculations.