Search code examples
pythonpython-itertoolsmoving-average

Average on overlapping windows in Python


I'm trying to compute a moving average but with a set step size between each average. For example, if I was computing the average of a 4 element window every 2 elements:

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This should produce the average of [1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8], [7, 8, 9, 10].

window_avg = [2.5, 4.5, 6.5, 8.5]

My data is such that the ending will be truncated before processing so there is no problem with the length with respect to window size.

I've read a bit about how to do moving averages in Python and there seems to be a lot of usage of itertools; however, the iterators go one element at a time and I can't figure out how to have a step size between each calculation of the average. (How to calculate moving average in Python 3?)

I have also been able to do this before in MATLAB by creating a matrix of indices which are overlapping and then indexing the data vector and performing a column wise mean (Create matrix by repeatedly overlapping a vector). However, since this vector is rather large (~70 000 elements, window of 450 samples, average every 30 samples), the computation would probably require too much memory.

Any help would be greatly appreciated. I am using Python 2.7.


Solution

  • One way to compute the average of a sliding window across a list in Python is to use a list comprehension. You can use

    >>> range(0, len(data), 2)
    [0, 2, 4, 6, 8]
    

    to get the starting indices of each window, and then numpy's mean function to take the average of each window. See the demo below:

    >>> import numpy as np
    >>> window_size = 4
    >>> stride = 2
    >>> window_avg = [ np.mean(data[i:i+window_size]) for i in range(0, len(data), stride)
                       if i+window_size <= len(data) ]
    >>> window_avg
    [2.5, 4.5, 6.5, 8.5]
    

    Note that the list comprehension does have a condition to ensure that it only computes the average of "full windows", or sublists with exactly window_size elements.

    When run on a dataset of the size discussed in the OP, this method computes on my MBA in a little over 200 ms:

    In [5]: window_size = 450
    In [6]: data = range(70000)
    In [7]: stride = 30
    In [8]: timeit [ np.mean(data[i:i+window_size]) for i in range(0, len(data), stride)
                     if i+window_size <= len(data) ]
    1 loops, best of 3: 220 ms per loop
    

    It is about twice as fast on my machine to the itertools approach presented by @Abhijit:

    In [9]: timeit map(np.mean, izip(*(islice(it, i, None, stride) for i, it in enumerate(tee(data, window_size)))))
    1 loops, best of 3: 436 ms per loop