Suppose I have a vector like so:
s = pd.Series(range(50))
The rolling sum over, let's say a 2-element window is easily calculated:
s.rolling(window=2, min_periods=2).mean()
0 NaN
1 0.5
2 1.5
3 2.5
4 3.5
5 4.5
6 5.5
7 6.5
8 7.5
9 8.5
...
Now I don't want to take the adjacent 2 elements for the window, but I want to take e.g. every third element. Still only take the last 2 of them. It would result in this vector:
0 NaN
1 NaN
2 NaN
3 1.5 -- (3+0)/2
4 2.5 -- (4+1)/2
5 3.5 -- (5+2)/2
6 4.5 -- ...
7 5.5
8 6.5
9 7.5
...
How can I achieve this efficiently?
Thanks!
use stride parameter in the numpy.ndarray.strides attribute, which allows you to specify the number of bytes to step in each dimension when traversing an array.
import numpy as np
arr = np.arange(10)
strided = np.lib.stride_tricks.as_strided(arr, shape=(len(arr)//3, 3), strides=(3*arr.itemsize, arr.itemsize))
result = np.mean(strided[:, -2:], axis=1)
output:
array([1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5])