I have a simple numpy array.
array([[10, 0, 10, 0],
[ 1, 1, 0, 0]
[ 9, 9, 9, 0]
[ 0, 10, 1, 0]])
I would like to take the median of each column, individually, of this array.
However, there are a few 0
values in various places which I would like to ignore in the calculation of the medians.
To further complicate, I would like to keep the columns with only 0
entries as having the median of 0
. In this manner, those columns would serve as a bit of a place holder, keeping the dimensions of the matrix the same.
The numpy documentation doesn't have any argument that would work for what I want (maybe I am spoiled by the many switches we get with R!)
numpy.median(a, axis=None, out=None, overwrite_input=False)[source]
Can someone please shed some light on an effective way to do this, which is in line with the spirit of numpy? I could hack it out but in that case I feel like I've defeated the purpose of using numpy in the first place.
Thanks in advance.
Use masked arrays and np.ma.median(axis=0).filled(0)
to get the medians of the columns.
In [1]: x = np.array([[10, 0, 10, 0], [1, 1, 0, 0], [9, 9, 9, 0], [0, 10, 1, 0]])
In [2]: y = np.ma.masked_where(x == 0, x)
In [3]: x
Out[3]:
array([[10, 0, 10, 0],
[ 1, 1, 0, 0],
[ 9, 9, 9, 0],
[ 0, 10, 1, 0]])
In [4]: y
Out[4]:
masked_array(data =
[[10 -- 10 --]
[1 1 -- --]
[9 9 9 --]
[-- 10 1 --]],
mask =
[[False True False True]
[False False True True]
[False False False True]
[ True False False True]],
fill_value = 999999)
In [6]: np.median(x, axis=0)
Out[6]: array([ 5., 5., 5., 0.])
In [7]: np.ma.median(y, axis=0).filled(0)
Out[7]:
array(data = [ 9. 9. 9., 0.])