Search code examples
pythonarraysnumpyzeromedian

How can I ignore zeros when I take the median on columns of an array?


I have a simple numpy array.

array([[10,   0,  10,  0],
       [ 1,   1,   0,  0]
       [ 9,   9,   9,  0]
       [ 0,  10,   1,  0]])

I would like to take the median of each column, individually, of this array.

However, there are a few 0 values in various places which I would like to ignore in the calculation of the medians.

To further complicate, I would like to keep the columns with only 0 entries as having the median of 0. In this manner, those columns would serve as a bit of a place holder, keeping the dimensions of the matrix the same.

The numpy documentation doesn't have any argument that would work for what I want (maybe I am spoiled by the many switches we get with R!)

numpy.median(a, axis=None, out=None, overwrite_input=False)[source]

Can someone please shed some light on an effective way to do this, which is in line with the spirit of numpy? I could hack it out but in that case I feel like I've defeated the purpose of using numpy in the first place.

Thanks in advance.


Solution

  • Use masked arrays and np.ma.median(axis=0).filled(0) to get the medians of the columns.

    In [1]: x = np.array([[10, 0, 10, 0], [1, 1, 0, 0], [9, 9, 9, 0], [0, 10, 1, 0]])
    In [2]: y = np.ma.masked_where(x == 0, x)
    In [3]: x
    Out[3]: 
    array([[10,  0, 10, 0],
           [ 1,  1,  0, 0],
           [ 9,  9,  9, 0],
           [ 0, 10,  1, 0]])
    In [4]: y
    Out[4]: 
    masked_array(data =
     [[10 -- 10 --]
     [1 1 -- --]
     [9 9 9 --]
     [-- 10 1 --]],
                 mask =
     [[False  True False True]
     [False False  True True]
     [False False False True]
     [ True False False True]],
           fill_value = 999999)
    In [6]: np.median(x, axis=0)
    Out[6]: array([ 5.,  5.,  5., 0.])
    In [7]: np.ma.median(y, axis=0).filled(0)
    Out[7]: 
    array(data = [ 9.  9.  9., 0.])