Search code examples
pythonnumpymedianmasked-array

unexpected behaviour of numpy.median on masked arrays


I've a question relating the behaviour of numpy.median() on masked arrays created with numpy.ma.masked_array().

As I've understood from debugging my own code, numpy.median() does not work as expected on masked arrays (see Using numpy.median on a masked array for a definition of the problem)

The answer provided was:

Explanation: If I remember correctly, the np.median does not support subclasses, so it fails to work correctly on np.ma.MaskedArray.

The conclusion therefore being that in order to calculate the median of the elements in a masked array is to use numpy.ma.median() since this is a median function dedicated to masked arrays.

My problem lies in the fact that I've just spent a considerable amount of time finding this problem since there is no way of knowing this problem.

There is no warning or exception raised when trying to calculate the median of a masked array via numpy.median().

The answer returned by this function is not what is expected, and cause serious problems when people are not aware of this.

Does anyone know if this might be considered a bug?

In my opinion, the expected behaviour should be that using numpy.median on a masked array will raise and exception of some sort.

Any thoughts???

The below test script shows the unwanted and unexpected behaviour of using numpy.median on a masked array (note that the correct and expected median value of the valid elements is 2.5!!!):

In [1]: import numpy as np

In [2]: test = np.array([1, 2, 3, 4, 100, 100, 100, 100])

In [3]: valid_elements = np.array([1, 1, 1, 1, 0, 0, 0, 0], dtype=np.bool)

In [4]: testm = np.ma.masked_array(test, ~valid_elements)

In [5]: testm
Out[5]: 
masked_array(data = [1 2 3 4 -- -- -- --],
             mask = [False False False False  True  True  True  True],
       fill_value = 999999)

In [6]: np.median(test)
Out[6]: 52.0

In [7]: np.median(test[valid_elements])
Out[7]: 2.5

In [8]: np.median(testm)
Out[8]: 4.0

In [9]: np.ma.median(testm)
Out[9]: 2.5

Solution

  • Does anyone know if this might be considered a bug?

    Well, it is a Bug! I posted it a few months ago on their issue tracker (Link to the bug report).

    The reason for this behaviour is that np.median uses the partition method of the input-array but np.ma.MaskedArray doesn't override the partition method. So when arr.partition is called in np.median it simply defaults to the basic numpy.ndarray.partition method (which is bogus for a masked array!).