Search code examples
pythonarraysnumpyfloating-pointfloating-accuracy

Array comparison not matching elementwise comparison in numpy


I have a numpy array arr. It's a numpy.ndarray, size is (5553110,), dtype=float32.

When I do:

(arr > np.pi )[3154950]
False
(arr[3154950] > np.pi )
True

Why is the first comparison getting it wrong? And how can I fix it?

The values:

arr[3154950]= 3.1415927
np.pi= 3.141592653589793

Is the problem with precision?


Solution

  • The problem is due to accuracy of np.float32 vs np.float64.

    Use np.float64 and you will not see a problem:

    import numpy as np
    
    arr = np.array([3.1415927], dtype=np.float64)
    
    print((arr > np.pi)[0])  # True
    
    print(arr[0] > np.pi)    # True
    

    As @WarrenWeckesser comments:

    It involves how numpy decides to cast the arguments of its operations. Apparently, with arr > scalar, the scalar is converted to the same type as the array arr, which in this case is np.float32. On the other hand, with something like arr > arr2, with both arguments nonscalar arrays, they will use a common data type. That's why (arr > np.array([np.pi]))[3154950] returns True.

    Related github issue