Given a data set with something like:
[2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 65, 75, 85, 86, 87, 88]
The values are always increasing (in fact it's time), and I want to find out a running average distance between the values. I am in effect trying to determine when the data goes from "1 every second" to "1 every 5 seconds" (or any other value).
I am implementing this in Python, but a solution in any language is most welcome.
The output I am looking for the sample input above, would be something like:
[(2, 1), (10, 5), (55, 10), (85, 1) ]
where, "2" would indicate where the distance between values started out being "1" and, and "10" would indicate where the distance shifted to being "5". (It would have to be exactly there, if the shift was detected a step later, wouldn't matter.)
I am looking for when the average distance between values changes. I realize there will be some kind of trade off between stability of the algorithm and sensitivity to changes in the input.
In Python:
a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 34, 40, 45, 46, 50, 55]
# zip() creates tuples of two consecutive values
# (it zips lists of different length by truncating longer list(s))
# then tuples with first value and difference are placed in 'diff' list
diff = [(x, y-x) for x, y in zip(a, a[1:])]
# now pick only elements with changed difference
result = []
for pair in diff:
if not len(result) or result[-1][1]!=pair[1]: # -1 to take last element
result.append(pair)