With respect to the while loop to keep the search running while (low <= high)
, is there any consensus if it's better to use a '<' or '<=' since I see instances of both online?
The exit conditions of some binary search templates differ in the sense that some of them use Style A where as soon as target equals the midpoint we return target. B, on the other hand, only happens after the while loop is exited and returns nums[low]. Is this just a matter of personal preference?
A. if (target === nums[mid]) return target;
B. return nums[low]
after exiting while loop
The difference in the while
condition comes from the difference in the meaning of high
: sometimes it is the index after the last element that belongs to the range, and sometimes it represent the index of the last element itself. There is no consensus on which definition of high
is to be used.
However, you could align this choice to how ranges are usually represented in the language environment you use. For instance, in Python, range(a, b)
excludes b
, and in JavaScript arr.slice(a, b)
excludes b
too. So in those languages I would go for a definition of high
that excludes it from the range, and so your while
condition would be <
. In VBA there are LBound
and UBound
, which suggest an inclusive definition of high
, and so the while
condition would be <=
.
Either way the while
condition should reflect that the current range is non-empty.
The addition of this line in the loop
if (target === nums[mid]) return target;
...is another matter.
First of all, you would not return target
, as that doesn't really help the caller (they already know the value of target
, since they provided it). It is more appropriate to return the index where the value was found, so that the caller might act on that index:
if (target === nums[mid]) return mid;
For the same reasons, the alternative code would do:
return low;
On the one hand the first code variant offers a possibility to exit the loop early. In the extreme case we might even get lucky, and find a match in the very first iteration of the while
loop, and thereby save maybe 10 useless iterations that you would otherwise still do using the alternative code that does not include this if
. This may seem a good thing.
But, statistically, if the value is in the list, you'll have like a 50% change to only find it in the last possible iteration (when the range is just 1 wide). And if the value is not in the list, you will certainly have to perform all iterations. So it makes sense to make each iteration do as little as possible. And that is the reason not to include that if
statement in the loop, but only execute it once, after the loop.
To choose what is the best strategy for your case, implement them both, and do a benchmark comparison based on realistic data and searches.