Search code examples
pythonarraysnumpywhere-clauseminimum

Why don't np.where & np.min seem to work right with this array?


The issues

So I have an array I imported containing values ranging from ~0.0 to ~0.76. When I started trying to find the min & max values using Numpy, I ran into some strange inconsistencies that I'd like know how to solve if they're my fault, or avoid if they're programming errors on the Numpy developer's end.

The code

Let's start with finding the location of the maximum values using np.max & np.where.

print array.shape
print np.max(array)
print np.where(array == 0.763728955743)
print np.where(array == np.max(array))
print array[35,57]

The output is this:

(74, 145)
0.763728955743
(array([], dtype=int64), array([], dtype=int64))
(array([35]), array([57]))
0.763728955743

When I look for where the array exactly equals the maximum entry's value, Numpy doesn't find it. However, when I simply search for the location of the maximum values without specifying what that value is, it works. Note this doesn't happen in np.min.

Now I have a different issue regarding minima.

print array.shape
print np.min(array)
print np.where(array == 0.0)
print np.where(array == np.min(array))
print array[10,25], array[31,131]

Look at the returns.

(74, 145)
0.0
(array([10, 25]), array([ 31, 131]))
(array([10, 25]), array([ 31, 131]))
0.0769331747301 1.54220192172e-09

1.54^-9 is close enough to 0.0 that it seems like it would be the minimum value. But why is a location with the value 0.077 also listed by np.where? That's not even close to 0.0 compared to the other value.

The Questions

Why doesn't np.where seem to work when entering the maximum value of the array, but it does when searching for np.max(array) instead? And why does np.where() mixed with np.min() returns two locations, one of which is definitely not the minimum value?


Solution

  • You have two issues: the interpretation of floats and the interpretation of the results of np.where.

    1. Non-integer floating point numbers are stored internally in binary and can not always be represented exactly in decimal notation. Similarly, decimal numbers can not always be represented exactly in binary. This is why np.where(array == 0.763728955743) returns an empty array, while print np.where(array == np.max(array)) does the right thing. Note that the second case just uses the exact binary number internally without any conversions. The search for the minimum succeeds because 0.0 can be represented exactly in both decimal and binary. In general, it is a bad idea to compare floats using == for this and related reasons.
    2. For the version of np.where that you are using, it devolves into np.nonzero. You are interpreting the results here because it returns an array for each dimension of the array, not individual arrays of coordinates. There are a number of ways of saying this differently:

      • If you had three matches, you would be getting two arrays back, each with three elements.
      • If you had a 3D input array with two matches, you would get three arrays back, each with two elements.
      • The first array is row-coordinates (dim 0) and the second array is column-coordinates (dim 1).
      • Notice how you are interpreting the output of where for the maximum case. This is correct, but it is not what you are doing in the minimum case.

    There are a number of ways of dealing with these issues. The easiest could be to use np.argmax and np.argmin. These will return the first coordinate of a maximum or minimum in the array, respectively.

    >>> x = np.argmax(array)
    >>> print(x)
    array([35, 57])
    >> print(array[x])
    0.763728955743
    

    The only possible problem here is that you may want to get all of the coordinates.

    In that case, using where, or nonzero is fine. The only difference from your code is that you should print

    print array[10,31], array[25,131]
    

    instead of the transposed values as you are doing.