In my experiment, I have a large 2D np.ndarray
X
of type float64
of dimensions 25x431080. I want to calculate the element-wise median across the 0-axis to get an array of dimensions 1x431080. Assume that I distort a row of the original array such that the median should not be affected, e.g., assign it to a value out of the range of the original elements. My problem is that the median computation won't return the exact same array as before.
I am wondering whether this is a typical precision issue. Is there is any way around it perhaps with another type or function?
I am attaching here a randomly generated example s.t. one can reproduce the issue
import numpy as np
x = np.random.uniform(-1,1,(25,431080))
med1 = np.median(x, axis = 0)
x[13,:] = -100*np.ones(x.shape[1]) # distort one row to -100
med2 = np.median(x, axis = 0)
np.array_equal(med1, med2) # returns False
Note: re-computation of the median on the same array gives exactly the same result so there is no precision loss or any other change across different runs of the program.
I am not sure that your assumption is correct. Why should changing a value of the array to -100 not also change the median?
while True:
x1 = np.round(np.random.uniform(-1, 1, 10), 2)
x2 = x1.copy()
x2[3] = -100
m1 = np.median(x1)
m2 = np.median(x2)
if m1 != m2:
print(x1)
print(x2)
print(m1, m2)
break
Or maybe even simpler: an example array [1, 2, 3]
with the median 2
.
Changing one of the initial values to -100
in general also changes the median. But sometimes you are lucky. If you change a value smaller than the median with -100
the median stays the same, but if you exchange a value larger or equal to the median, the median changes.
x1 = [1, 2, 3] -> 2
x2 = [-100, 2, 3] -> 2
x3 = [1, -100, 3] -> 1
x4 = [1, 2, -100] -> 1