Search code examples
pythonarraysnumpymasked-array

np.ma.argmax on masked array of unsigned integer dtype returns wrong result in numpy 1.11.0


I stumbled over a strange fact concerning masked unsigned integer arrays and np.ma.argmax.

Consider the following array:

>>> marr = np.ma.array(np.array([[2,2,2], [3,3,3], [1,1,1]]), mask=False, dtype=np.uint16)
>>> marr
masked_array(data =
 [[2 2 2]
 [3 3 3]
 [1 1 1]],
             mask =
 [[False False False]
 [False False False]
 [False False False]],
       fill_value = 999999)

If I use np.ma.argmax the result is what I expected:

>>> print(np.ma.argmax(marr, axis=0))
[1 1 1]

If I now mask the last row the result is wrong:

>>> marr.mask[2] = True
>>> marr
masked_array(data =
 [[2 2 2]
 [3 3 3]
 [-- -- --]],
             mask =
 [[False False False]
 [False False False]
 [ True  True  True]],
       fill_value = 999999)
>>> print(np.ma.argmax(marr, axis=0))
[2 2 2]   # why?

It now thinks the masked row is the maximum? I even changed the fill_value to 0 but the result stays the same: It still thinks the masked row is the maximum. It seems like this only affects unsigned integer arrays.

However np.ma.MaskedArray.argmax and np.argmax return the correct result:

>>> print(marr.argmax(axis=0))
[1 1 1]
>>> print(np.argmax(marr, axis=0))
[1 1 1]

Why is the np.ma.argmax not doing the correct thing here? As far as I can see it is defined as the method itself.


Solution

  • Here's the code for np.ma.argmax (via ipython ?? magic) (version '1.11.0')

    Definition:  np.ma.argmax(a, axis=None, fill_value=None)
    Source:
    def argmax(a, axis=None, fill_value=None):
        "Function version of the eponymous method."
        if fill_value is None:
            fill_value = default_fill_value(a)
            try:
                fill_value = -fill_value
            except:
                pass
        d = filled(a, fill_value)
        return d.argmax(axis=axis)
    

    and for the method:

    def argmax(self, axis=None, fill_value=None, out=None):
        if fill_value is None:
            fill_value = maximum_fill_value(self._data)
        d = self.filled(fill_value).view(ndarray)
        return d.argmax(axis, out=out)
    

    The function uses a different fill value

    In [180]: np.ma.maximum_fill_value(marr)
    Out[180]: 0
    
    In [181]: np.ma.maximum_fill_value(marr.astype(int))
    Out[181]: -2147483648
    
    In [182]: np.ma.default_fill_value(marr)
    Out[182]: array(999999)
    
    In [183]: -np.ma.default_fill_value(marr)
    Out[183]: -999999
    
    In [184]: np.ma.filled(marr,-np.ma.default_fill_value(marr))
    Out[184]: 
    array([[    2,     2,     2],
           [    3,     3,     3],
           [48577, 48577, 48577]], dtype=uint16)
    
    In [186]: np.ma.filled(marr,np.ma.maximum_fill_value(marr))
    Out[186]: 
    array([[2, 2, 2],
           [3, 3, 3],
           [0, 0, 0]], dtype=uint16)
    

    It's a bug in my version. There was a change in Feb, replacing the argmax (and argmin) with the method (argmax = _frommethod('argmax')).

    https://github.com/numpy/numpy/commit/36f76ea2e6e91062df12d3a46ccaed7822bc82f2

    So that correction isn't in my distribution - an presumably not yours.

    So for now stick with method, or provide your own correct fill_value.

    In [187]: np.ma.argmax(marr,axis=0,fill_value=0)
    Out[187]: array([1, 1, 1], dtype=int32)