python numpy floating-point precision median

Numpy median precision issues at scale

In my experiment, I have a large 2D np.ndarray X of type float64 of dimensions 25x431080. I want to calculate the element-wise median across the 0-axis to get an array of dimensions 1x431080. Assume that I distort a row of the original array such that the median should not be affected, e.g., assign it to a value out of the range of the original elements. My problem is that the median computation won't return the exact same array as before.

I am wondering whether this is a typical precision issue. Is there is any way around it perhaps with another type or function?

I am attaching here a randomly generated example s.t. one can reproduce the issue

import numpy as np
x = np.random.uniform(-1,1,(25,431080))
med1 = np.median(x, axis = 0)
x[13,:] = -100*np.ones(x.shape[1]) # distort one row to -100
med2 = np.median(x, axis = 0)
np.array_equal(med1, med2) # returns False

Note: re-computation of the median on the same array gives exactly the same result so there is no precision loss or any other change across different runs of the program.

Solution

I am not sure that your assumption is correct. Why should changing a value of the array to -100 not also change the median?

while True:
    x1 = np.round(np.random.uniform(-1, 1, 10), 2)
    x2 = x1.copy()
    x2[3] = -100

    m1 = np.median(x1)
    m2 = np.median(x2)
    
    if m1 != m2:
        print(x1)
        print(x2)
        print(m1, m2)
        break

Or maybe even simpler: an example array [1, 2, 3] with the median 2. Changing one of the initial values to -100 in general also changes the median. But sometimes you are lucky. If you change a value smaller than the median with -100 the median stays the same, but if you exchange a value larger or equal to the median, the median changes.

x1 = [1,    2,    3] -> 2
x2 = [-100, 2,    3] -> 2
x3 = [1, -100,    3] -> 1
x4 = [1,    2, -100] -> 1