Running or sliding median, mean and standard deviation

I am trying to calculate the running median, mean and std of a large array. I know how to calculate the running mean as below:

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0))
    return (cumsum[N:] - cumsum[:-N]) / float(N)

This works very efficiently. But I do not quite understand why (cumsum[N:] - cumsum[:-N]) / float(N) can give the mean value (I borrowed from someome else).

I tried to add another return sentence to calculate the median, but it does not do what I want.

return (cumsum[N:] - cumsum[:-N]) / float(N), np.median(cumsum[N:] - cumsum[:-N])

Does anyone offer me some hint to approach this problem? Thank you very much.

Huanian Zhang

Solution

That cumsum trick is specific to finding sum or average values and don't think you can extend it simply to get median and std values. One approach to perform a generic ufunc operation in a sliding/running window on a 1D array would be to create a series of 1D sliding windows-based indices stacked as a 2D array and then apply the ufunc along the stacking axis. For getting those indices, you can use broadcasting.

Thus, for performing running mean, it would look like this -

idx = np.arange(N) + np.arange(len(x)-N+1)[:,None]
out = np.mean(x[idx],axis=1)

For running median and std, just replace np.mean with np.median and np.std respectively.