Search code examples
algorithmbinary-search

While and Exit Condition in Binary Search


  1. With respect to the while loop to keep the search running while (low <= high), is there any consensus if it's better to use a '<' or '<=' since I see instances of both online?

  2. The exit conditions of some binary search templates differ in the sense that some of them use Style A where as soon as target equals the midpoint we return target. B, on the other hand, only happens after the while loop is exited and returns nums[low]. Is this just a matter of personal preference?

A. if (target === nums[mid]) return target;

B. return nums[low] after exiting while loop


Solution

  • The difference in the while condition comes from the difference in the meaning of high: sometimes it is the index after the last element that belongs to the range, and sometimes it represent the index of the last element itself. There is no consensus on which definition of high is to be used.

    However, you could align this choice to how ranges are usually represented in the language environment you use. For instance, in Python, range(a, b) excludes b, and in JavaScript arr.slice(a, b) excludes b too. So in those languages I would go for a definition of high that excludes it from the range, and so your while condition would be <. In VBA there are LBound and UBound, which suggest an inclusive definition of high, and so the while condition would be <=.

    Either way the while condition should reflect that the current range is non-empty.

    The addition of this line in the loop

    if (target === nums[mid]) return target;
    

    ...is another matter.

    First of all, you would not return target, as that doesn't really help the caller (they already know the value of target, since they provided it). It is more appropriate to return the index where the value was found, so that the caller might act on that index:

    if (target === nums[mid]) return mid;
    

    For the same reasons, the alternative code would do:

    return low;
    

    On the one hand the first code variant offers a possibility to exit the loop early. In the extreme case we might even get lucky, and find a match in the very first iteration of the while loop, and thereby save maybe 10 useless iterations that you would otherwise still do using the alternative code that does not include this if. This may seem a good thing.

    But, statistically, if the value is in the list, you'll have like a 50% change to only find it in the last possible iteration (when the range is just 1 wide). And if the value is not in the list, you will certainly have to perform all iterations. So it makes sense to make each iteration do as little as possible. And that is the reason not to include that if statement in the loop, but only execute it once, after the loop.

    To choose what is the best strategy for your case, implement them both, and do a benchmark comparison based on realistic data and searches.