I am looking for the best way to find a sequence of values of varying lengths within a longer pandas Series. For example, I have the values [92.6, 92.7, 92.9]
(but could also be length 2 or 5) and would like to find all the cases where this exact sequence occurs within the longer Series
s = pd.Series([92.6,92.7,92.9,24.2,24.3,25.1,24.9,25.1,24.9,97.6,94.5,1.0,92.6,92.7,92.9,97.9,96.8,96.4,92.8,92.8,93.1,89.5,89.6])
(actual series is approx length 1000).
In this example the correct result should be indices 0,1,2
and 12,13,14
.
Using rolling
to identify the last row of each stretch:
target = [92.6, 92.7, 92.9]
m = s.rolling(len(target)).apply(lambda x: x.eq(target).all())
out = m[m.eq(1)].index
Output: [2, 14]
For all indices:
out = [x for end in m[m.eq(1)].index for x in range(end-len(target)+1, end+1)]
Output:
[0, 1, 2, 12, 13, 14]
Alternatively, using numpy's sliding_window_view
, giving the starting indices:
from numpy.lib.stride_tricks import sliding_window_view as swv
out = np.where((swv(s, len(target)) == target).all(axis=1))[0]
Output: array([ 0, 12])
For all indices:
out2 = (np.linspace(out[:,None], out[:,None]+len(target)-1, len(target))
.ravel('F').astype(int)
)
Output: array([ 0, 1, 2, 12, 13, 14])