I need to find indices of all occurrences of a particular pattern in a string (or numerical vector). For example, given the boolean list (DataFrame):
z =
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 True
25 True
26 True
27 False
28 False
29 False
30 False
31 False
32 False
33 False
34 False
35 False
36 True
37 False
38 False
39 False
40 True
41 False
42 False
43 False
44 False
45 True
46 True
47 True
48 False
49 False
I am interested in a function which returns indices of all occurrences of three 'True' in a row, in this example, I should get the index
>> result = some_function(z)
>> print result
>> [24, 45]
In matlab it is quite easy with the function strcmp, which does exactly what I need. I am sure that there is a similar function in Python.
I tried to use 'if ['True', 'True', 'True'] in z
:....but I am doing something wrong.
UPD I found a very simple and general solution to such problems, which works with any datatype:
def find_subarray_in_array(sub_array, large_array):
large_array_view = as_strided(large_array, shape=(len(large_array) - len(sub_array) + 1, len(sub_array)), strides=(large_array.dtype.itemsize,) * 2)
return where(numpy.all(large_array_view == sub_array, axis=1))[0]
where "sub_array" is the pattern which should be found in the larger array "large_array".
I'm assuming here that your inputs are lists:
inds =
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49]
bools =
[False,False,False,False,False,False,False,False,False, True, True,
True,False,False,False,False,False,False,False,False,False, True,
False,False,False, True,False,False,False,False, True, True, True,
False,False]
You then want to check for the pattern [True, True, True]
pattern = [True, True, True]
The required comparison is then done by:
[inds[i] for i in range(len(bools)) if bools[i:i+len(pattern)] == pattern ]
Returns:
[24, 45]