Binary-search without an explicit array

I want to perform a binary-search using e.g. np.searchsorted, however, I do not want to create an explicit array containing values. Instead, I want to define a function giving the value to be expected at the desired position of the array, e.g. p(i) = i, where i denotes the position within the array.

Generating an array of values regarding the function would, in my case, be neither efficient nor elegant. Is there any way to achieve this?

Solution

What about something like:

import collections

class GeneratorSequence(collections.Sequence):
    def __init__(self, func, size):
        self._func = func
        self._len = size

    def __len__(self):
        return self._len

    def __getitem__(self, i):
        if 0 <= i < self._len:
            return self._func(i)
        else:
            raise IndexError

    def __iter__(self):
        for i in range(self._len):
            yield self[i]

This would work with np.searchsorted(), e.g.:

import numpy as np

gen_seq = GeneratorSequence(lambda x: x ** 2, 100)
np.searchsorted(gen_seq, 9)
# 3

You could also write your own binary search function, you do not really need NumPy in this case, and it can actually be beneficial:

def bin_search(seq, item):
    first = 0
    last = len(seq) - 1
    found = False
    while first <= last and not found:
        midpoint = (first + last) // 2
        if seq[midpoint] == item:
            first = midpoint
            found = True
        else:
            if item < seq[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1
    return first

Which gives identical results:

all(bin_search(gen_seq, i) == np.searchsorted(gen_seq, i) for i in range(100))
# True

Incidentally, this is also WAY faster:

gen_seq = GeneratorSequence(lambda x: x ** 2, 1000000)

%timeit np.searchsorted(gen_seq, 10000)
# 1 loop, best of 3: 1.23 s per loop
%timeit bin_search(gen_seq, 10000)
# 100000 loops, best of 3: 16.1 µs per loop