Search code examples
pythondataframestrcmp

matlabish "strncmp" in python


I need to find indices of all occurrences of a particular pattern in a string (or numerical vector). For example, given the boolean list (DataFrame):

z = 
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24     True
25     True
26     True
27    False
28    False
29    False
30    False
31    False
32    False
33    False
34    False
35    False
36     True
37    False
38    False
39    False
40     True
41    False
42    False
43    False
44    False
45     True
46     True
47     True
48    False
49    False

I am interested in a function which returns indices of all occurrences of three 'True' in a row, in this example, I should get the index

>> result = some_function(z)

>> print result

>> [24, 45]

In matlab it is quite easy with the function strcmp, which does exactly what I need. I am sure that there is a similar function in Python.

I tried to use 'if ['True', 'True', 'True'] in z:....but I am doing something wrong.

UPD I found a very simple and general solution to such problems, which works with any datatype:

def find_subarray_in_array(sub_array, large_array):
    large_array_view = as_strided(large_array, shape=(len(large_array) - len(sub_array) + 1, len(sub_array)), strides=(large_array.dtype.itemsize,) * 2)
    return where(numpy.all(large_array_view == sub_array, axis=1))[0]

where "sub_array" is the pattern which should be found in the larger array "large_array".


Solution

  • I'm assuming here that your inputs are lists:

    inds = 
    [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
     31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
     47, 48, 49] 
    bools = 
    [False,False,False,False,False,False,False,False,False, True, True,
     True,False,False,False,False,False,False,False,False,False, True,
     False,False,False, True,False,False,False,False, True, True, True,
     False,False]
    

    You then want to check for the pattern [True, True, True]

    pattern = [True, True, True]
    

    The required comparison is then done by:

    [inds[i] for i in range(len(bools)) if bools[i:i+len(pattern)] == pattern  ]
    

    Returns:

    [24, 45]