Search code examples
pythonnumpymedian

Getting median of portion of array according to bool python


I have two arrays of the same length, the first one is a boolean array, the second one contains the corresponding values.

flag   = [0,0,0,1,1,0,0,0,1,1,1,1,0,1,1]
values = [1,5,6,8,5,6,2,0,1,9,3,8,3,6,2]

I want to return an array of medians containing the median values corresponding to each portions of 1 in the boolean matrix.

e.g.

flag   = [0,0,0,1,  1,  0,0,0, 1,  1,  1,  1, 0,1,1]
result = [0,0,0,6.5,6.5,0,0,0,5.5,5.5,5.5,5.5,0,4,4]

My unesthetic approach is to do:

result = np.zeros(values.shape[0])
vect = []
idx = []
for n in np.arange(result.size):
    if flag[n] > 0:
        vect.append(values[n])
        idx.append(n)
    elif flag[n] == 0:
        result[idx] = np.median(vect)
        vect = []
        idx = []
    result[idx] = np.median(vect)

It works well but it's not very pythonic and very slow since I work with very big arrays.


Solution

  • We can use np.diff to find transitions between 0 and 1. Then loop over pairs of 0/1 and 1/0 transitions and take the median from all values inbetween.

    The resulting loop iterates over each group of ones.

    flag   = [0,0,0,1,1,0,0,0,1,1,1,1,0,1,1]
    values = [1,5,6,8,5,6,2,0,1,9,3,8,3,6,2]
    
    d = np.diff(np.concatenate([[0], flag, [0]]))  # Add and append a 0 so the procedure also works if flags start or end with 1.
    
    begin = np.flatnonzero(d==1)
    end = np.flatnonzero(d==-1)
    
    result = np.zeros_like(values, dtype=float)
    
    for a, b in zip(begin, end):
        result[a:b] = np.median(values[a:b])
    
    print(result)
    # [ 0.   0.   0.   6.5  6.5  0.   0.   0.   5.5  5.5  5.5  5.5  0.   4.   4. ]