Search code examples
numpywhere-clausepython-3.4string-comparison

Finding entries containing a substring in a numpy array?


I tried to find entries in an Array containing a substring with np.where and an in condition:

import numpy as np
foo = "aa"
bar = np.array(["aaa", "aab", "aca"])
np.where(foo in bar)

this only returns an empty Array.
Why is that so?
And is there a good alternative solution?


Solution

  • We can use np.core.defchararray.find to find the position of foo string in each element of bar, which would return -1 if not found. Thus, it could be used to detect whether foo is present in each element or not by checking for -1 on the output from find. Finally, we would use np.flatnonzero to get the indices of matches. So, we would have an implementation, like so -

    np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
    

    Sample run -

    In [91]: bar
    Out[91]: 
    array(['aaa', 'aab', 'aca'], 
          dtype='|S3')
    
    In [92]: foo
    Out[92]: 'aa'
    
    In [93]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
    Out[93]: array([0, 1])
    
    In [94]: bar[2] = 'jaa'
    
    In [95]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
    Out[95]: array([0, 1, 2])