Why Numba doesn't improve this recursive function

I have an array of true/false values with a very simple structure:

# the real array has hundreds of thousands of items
positions = np.array([True, False, False, False, True, True, True, True, False, False, False], dtype=np.bool)

I want to traverse this array and output the places where changes happen (true becomes false or the contrary). For this purpose, I've put together two different approaches:

a recursive binary search (see if all values are the same, if not, split in two, then recurse)
a purely iterative search (loop through all elements and compare with the previous/next one)

Both versions give exactly the result that I want, however Numba has a greater effect on one than another. With a dummy array of 300k values, here are the performance results:

Performance results with array of 300k elements

pure Python binary-search runs in 11 ms

pure Python iterative-search runs in 1.1 s (100x slower than binary-search)

Numba binary-search runs in 5 ms (2 times faster than pure Python equivalent)

Numba iterative-search runs in 900 µs (1,200 times faster than pure Python equivalent)

As a result, when using Numba, binary_search is 5x slower than iterative_search, while in theory it should be 100x faster (it should be expected to run in 9 µs if it was properly accelerated).

What can be done to make Numba accelerate binary-search as much as it accelerates iterative-search?

Code for both approaches (along with a sample position array) is available on this public gist: https://gist.github.com/JivanRoquet/d58989aa0a4598e060ec2c705b9f3d8f

Note: Numba is not running binary_search() in object mode, because when mentioning nopython=True, it doesn't complain and happily compiles the function.

Solution

You can find the positions of value changes by using np.diff, there is no need to run a more complicated algorithm, or to use numba:

positions = np.array([True, False, False, False, True, True, True, True, False, False, False], dtype=np.bool)
dpos = np.diff(positions)
# array([ True, False, False,  True, False, False, False,  True, False, False])

This works, because False - True == -1 and np.bool(-1) == True.

It performs quite well on my battery powered (= throttled due to energy saving mode), and several years old laptop:

In [52]: positions = np.random.randint(0, 2, size=300_000, dtype=bool)          

In [53]: %timeit np.diff(positions)                                             
633 µs ± 4.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I'd imagine that writing your own diff in numba should yield similar performance.

EDIT: The last statement is false, I implemented a simple diff function using numba, and it's more than a factor of 10 faster than the numpy one (but it obviously also has much less features, but should be sufficient for this task):

@numba.njit 
def ndiff(x): 
    s = x.size - 1 
    r = np.empty(s, dtype=x.dtype) 
    for i in range(s): 
        r[i] = x[i+1] - x[i] 
    return r

In [68]: np.all(ndiff(positions) == np.diff(positions))                            
Out[68]: True

In [69]: %timeit ndiff(positions)                                               
46 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)