running median excluding zeros

I borrowed some code to calculate the running median of an array. But for each running array, I want to exclude the zero values. Below is the code:

def RunningMedian(seq, M):
    seq = iter(seq)
    s = []
    m = M // 2

    # Set up list s (to be sorted) and load deque with first window of seq
    s = [item for item in islice(seq, M)]
    d = deque(s)
    # Simple lambda function to handle even/odd window sizes    
    median = lambda : s[m] if bool(M&1) else (s[m-1]+s[m]) * 0.5
    # Sort it in increasing order and extract the median ("center" of the sorted window)
    s.sort()
    # remove zeros from the array
    s = np.trim_zeros(s)
    print s
    medians = [median()]
    for item in seq:
        old = d.popleft()          # pop oldest from left
        d.append(item)             # push newest in from right
        del s[bisect_left(s, old)] # locate insertion point and then remove old 
        insort(s, item)            # insert newest such that new sort is not required        
        s = np.trim_zeros(s)
        print s
        medians.append(median())
    return medians

I am testing the code, but it failed. My example is a = np.array([5 2 0 9 4 2 6 8]), I called this function RunningMedian(a,3). What I want for each running box is:

[2,5]
[2,9]
[4,9]
[2,4,9]
[2,4,6]
[2,6,8]

However, after I called the above function, it gives:

[2, 5]
[2, 9]
[4, 9]
[2, 9]
[2, 6]
[2, 8]

And also it returns wrong median values.
The returned median from the call is: [5, 9, 9, 9, 6, 8]

Anyone could help me to correct this issue? Thank you.

Solution

The main issue with your code is that throwing away zeros in s messes with the length of the objects used, which explains why you didn't get 3-length windows at the end.

I suggest another approach: using a proper function for median and ignoring those zero values locally. This way it's cleaner, and you don't need trim_zeros (it's really bad practice to import numpy just for this). Based on your function, here's what I came up:

from itertools import islice
from collections import deque
from bisect import bisect_left,insort

def median(s):
    sp = [nz for nz in s if nz!=0]
    print(sp)
    Mnow = len(sp)
    mnow = Mnow // 2
    return sp[mnow] if bool(Mnow&1) else (sp[mnow-1]+sp[mnow])*0.5

def RunningMedian(seq, M):
    seq = iter(seq)
    s = []
    m = M // 2

    # Set up list s (to be sorted) and load deque with first window of seq
    s = [item for item in islice(seq, M)]
    d = deque(s)
    ## Simple lambda function to handle even/odd window sizes    
    #median = lambda: s[m] if bool(M&1) else (s[m-1]+s[m])*0.5

    # Sort it in increasing order and extract the median ("center" of the sorted window)
    s.sort()
    medians = [median(s)]
    for item in seq:
        old = d.popleft()          # pop oldest from left
        d.append(item)             # push newest in from right
        del s[bisect_left(s, old)] # locate insertion point and then remove old 
        insort(s, item)            # insert newest such that new sort is not required        
        medians.append(median(s))
    return medians

Most of the change is in the new median function, and I moved the prints there. I also added your imports. Note that I'd approach this problem very differently, and it's highly possible that the current "fixed" version smells of duck tape.

Anyway, it seems to work as you want it to:

>>> a = [5, 2, 0, 9, 4, 2, 6, 8]

>>> RunningMedian(a,3)
[2, 5]
[2, 9]
[4, 9]
[2, 4, 9]
[2, 4, 6]
[2, 6, 8]
[3.5, 5.5, 6.5, 4, 4, 6]

And the reason why the medians were off in your version was that the parity of the window was determined from M, the input window width. If you discard the zeros, you'll end up with smaller (even-length) windows. In this case you don't need the middle (=second) element, but you need to average the two elements in the middle. Hence your erroneous output.