Search code examples
numpybinary-search

Binary-search without an explicit array


I want to perform a binary-search using e.g. np.searchsorted, however, I do not want to create an explicit array containing values. Instead, I want to define a function giving the value to be expected at the desired position of the array, e.g. p(i) = i, where i denotes the position within the array.

Generating an array of values regarding the function would, in my case, be neither efficient nor elegant. Is there any way to achieve this?


Solution

  • What about something like:

    import collections
    
    class GeneratorSequence(collections.Sequence):
        def __init__(self, func, size):
            self._func = func
            self._len = size
    
        def __len__(self):
            return self._len
    
        def __getitem__(self, i):
            if 0 <= i < self._len:
                return self._func(i)
            else:
                raise IndexError
    
        def __iter__(self):
            for i in range(self._len):
                yield self[i]
    

    This would work with np.searchsorted(), e.g.:

    import numpy as np
    
    gen_seq = GeneratorSequence(lambda x: x ** 2, 100)
    np.searchsorted(gen_seq, 9)
    # 3
    

    You could also write your own binary search function, you do not really need NumPy in this case, and it can actually be beneficial:

    def bin_search(seq, item):
        first = 0
        last = len(seq) - 1
        found = False
        while first <= last and not found:
            midpoint = (first + last) // 2
            if seq[midpoint] == item:
                first = midpoint
                found = True
            else:
                if item < seq[midpoint]:
                    last = midpoint - 1
                else:
                    first = midpoint + 1
        return first
    

    Which gives identical results:

    all(bin_search(gen_seq, i) == np.searchsorted(gen_seq, i) for i in range(100))
    # True
    

    Incidentally, this is also WAY faster:

    gen_seq = GeneratorSequence(lambda x: x ** 2, 1000000)
    
    %timeit np.searchsorted(gen_seq, 10000)
    # 1 loop, best of 3: 1.23 s per loop
    %timeit bin_search(gen_seq, 10000)
    # 100000 loops, best of 3: 16.1 µs per loop