Search code examples
python-3.xalgorithmbinary-search

How to choose indices of subintervals in binary search?


iterative binary search algorithm. I am wrote algorithm in 2 different ways. the change i made was high = len(data) and high = len(data) -1 . In both cases algorithm runs fine. But in most of the sites they show high = len(data) -1 is correct way. so is using -1 is better and why?

1st code)

def iterative_binary_search(data, target):
    low = 0
    high = len(data)               # this line is where I need help
    while low <= high:
        mid = (low + high) // 2
        if target == data[mid]:
            return True
        elif target < data[mid]:
            high = mid - 1
        else:
            low = mid + 1
    return False

2nd code)

def iterative_binary_search(data, target):
    low = 0
    high = len(data) -1           # this line is where I need help
    while low <= high:
        mid = (low + high) // 2
        if target == data[mid]:
            return True
        elif target < data[mid]:
            high = mid - 1
        else:
            low = mid + 1
    return False

Solution

  • One of the codes doesn't run fine.

    Calling ibs1 the first one, with high=len(data), and ibs2 the second one, with high = len(data)-1, I get :

    >>> haystack = [0,1,2,3,4,5,6,7,8,9]
    >>> ibs2(haystack, 11)
    False
    >>> ibs1(haystack, 11)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 6, in ibs1
    IndexError: list index out of range
    

    How to decide between len(data) and len(data) - 1

    You need to decide what low and high stand for, and make it very clear in your mind. When low=3 and high=6, what does it mean? Does it mean we are searching between list indices 3 and 6 included? Or excluded? That's up to you to decide. If it's included, then you should use high = len(data) - 1, because that is the index of the highest element of the array. If it's excluded, you should use high = len(data), because that is one past the index of the highest element in the array.

    Both decisions are fine. But then this decision must reflect in the logic of the remaining of the code.

    Hence, this code would be correct as well:

    def ibs3(haystack, needle):
      low = 0
      high = len(haystack)
      while low < high:
        mid = (low + high) // 2
        if needle == haystack[mid]:
          return True
        elif needle < haystack[mid]:
          high = mid
        else:
          low = mid + 1
      return False
    

    Note that in python, the convention is most often to include low and exclude high. For instance, print(list(range(7, 10))) prints [7, 8, 9]: no number 10 in there!