Search code examples
pythonnumpymeanmedian

Running or sliding median, mean and standard deviation


I am trying to calculate the running median, mean and std of a large array. I know how to calculate the running mean as below:

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0))
    return (cumsum[N:] - cumsum[:-N]) / float(N)

This works very efficiently. But I do not quite understand why (cumsum[N:] - cumsum[:-N]) / float(N) can give the mean value (I borrowed from someome else).

I tried to add another return sentence to calculate the median, but it does not do what I want.

return (cumsum[N:] - cumsum[:-N]) / float(N), np.median(cumsum[N:] - cumsum[:-N])

Does anyone offer me some hint to approach this problem? Thank you very much.

Huanian Zhang


Solution

  • That cumsum trick is specific to finding sum or average values and don't think you can extend it simply to get median and std values. One approach to perform a generic ufunc operation in a sliding/running window on a 1D array would be to create a series of 1D sliding windows-based indices stacked as a 2D array and then apply the ufunc along the stacking axis. For getting those indices, you can use broadcasting.

    Thus, for performing running mean, it would look like this -

    idx = np.arange(N) + np.arange(len(x)-N+1)[:,None]
    out = np.mean(x[idx],axis=1)
    

    For running median and std, just replace np.mean with np.median and np.std respectively.