Search code examples
pythonalgorithmnumpypandasaverage

Running average / frequency of time series data?


Given a data set with something like:

[2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 65, 75, 85, 86, 87, 88]

The values are always increasing (in fact it's time), and I want to find out a running average distance between the values. I am in effect trying to determine when the data goes from "1 every second" to "1 every 5 seconds" (or any other value).

I am implementing this in Python, but a solution in any language is most welcome.

The output I am looking for the sample input above, would be something like:

[(2, 1), (10, 5), (55, 10), (85, 1) ]

where, "2" would indicate where the distance between values started out being "1" and, and "10" would indicate where the distance shifted to being "5". (It would have to be exactly there, if the shift was detected a step later, wouldn't matter.)

I am looking for when the average distance between values changes. I realize there will be some kind of trade off between stability of the algorithm and sensitivity to changes in the input.

(Is Pandas or NumPy useful for this btw?)


Solution

  • In Python:

    a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 34, 40, 45, 46, 50, 55]
    # zip() creates tuples of two consecutive values 
    # (it zips lists of different length by truncating longer list(s))
    # then tuples with first value and difference are placed in 'diff' list
    diff = [(x, y-x) for x, y in zip(a, a[1:])]
    # now pick only elements with changed difference 
    result = []
    for pair in diff:
        if not len(result) or result[-1][1]!=pair[1]: # -1 to take last element
            result.append(pair)