Search code examples
binary-search

Binary search start or end is target


Why is it that when I see example code for binary search there is never an if statement to check if the start of the array or end is the target?

import java.util.Arrays;

public class App {

    public static int binary_search(int[] arr, int left, int right, int target) {
        if (left > right) {
            return -1;
        }

        int mid = (left + right) / 2;

        if (target == arr[mid]) {
            return mid;
        }

        if (target < arr[mid]) {
            return binary_search(arr, left, mid - 1, target);
        }

        return binary_search(arr, mid + 1, right, target);
    }

    public static void main(String[] args) {
        int[] arr = { 3, 2, 4, -1, 0, 1, 10, 20, 9, 7 };

        Arrays.sort(arr);
        for (int i = 0; i < arr.length; i++) {
            System.out.println("Index: " + i + " value: " + arr[i]);
        }

        System.out.println(binary_search(arr, arr[0], arr.length - 1, -1));
    }
}

in this example if the target was -1 or 20 the search would enter recursion. But it added an if statement to check if target is mid, so why not add two more statements also checking if its left or right?


Solution

  • EDIT:

    As pointed out in the comments, I may have misinterpreted the initial question. The answer below assumes that OP meant having the start/end checks as part of each step of the recursion, as opposed to checking once before the recursion even starts.

    Since I don't know for sure which interpretation was intended, I'm leaving this post here for now.


    Original post:

    You seem to be under the impression that "they added an extra check for mid, so surely they should also add an extra check for start and end".

    The check "Is mid the target?" is in fact not a mere optimization they added. Recursively checking "mid" is the whole point of a binary search.

    When you have a sorted array of elements, a binary search works like this:

    1. Compare the middle element to the target
      1. If the middle element is smaller, throw away the first half
      2. If the middle element is larger, throw away the second half
      3. Otherwise, we found it!
    2. Repeat until we either find the target or there are no more elements.

    The act of checking the middle is fundamental to determining which half of the array to continue searching through.

    Now, let's say we also add a check for start and end. What does this gain us? Well, if at any point the target happens to be at the very start or end of a segment, we skip a few steps and end slightly sooner. Is this a likely event?

    For small toy examples with a few elements, yeah, maybe.

    For a massive real-world dataset with billions of entries? Hm, let's think about it. For the sake of simplicity, we assume that we know the target is in the array.

    We start with the whole array. Is the first element the target? The odds of that is one in a billion. Pretty unlikely. Is the last element the target? The odds of that is also one in a billion. Pretty unlikely too. You've wasted two extra comparisons to speed up an extremely unlikely case.

    We limit ourselves to, say, the first half. We do the same thing again. Is the first element the target? Probably not since the odds are one in half a billion.

    ...and so on.

    The bigger the dataset, the more useless the start/end "optimization" becomes. In fact, in terms of (maximally optimized) comparisons, each step of the algorithm has three comparisons instead of the usual one. VERY roughly estimated, that suggests that the algorithm on average becomes three times slower.

    Even for smaller datasets, it is of dubious use since it basically becomes a quasi-linear search instead of a binary search. Yes, the odds are higher, but on average, we can expect a larger amount of comparisons before we reach our target.

    The whole point of a binary search is to reach the target with as few wasted comparisons as possible. Adding more unlikely-to-succeed comparisons is typically not the way to improve that.

    Edit:

    The implementation as posted by OP may also confuse the issue slightly. The implementation chooses to make two comparisons between target and mid. A more optimal implementation would instead make a single three-way comparison (i.e. determine ">", "=" or "<" as a single step instead of two separate ones). This is, for instance, how Java's compareTo or C++'s <=> normally works.