Search code examples
c++algorithmlogic

Why should mid-value be used instead of mid-value - 1 for a recursive implementation of a binary search?


Background

An interactive book presents this function as an example of a binary search.

void GuessNumber(int lowVal, int highVal) {
   int midVal;            // Midpoint of low and high value
   char userAnswer;       // User response
   
   midVal = (highVal + lowVal) / 2;
   
   // Prompt user for input
   cout << "Is it " << midVal << "? (l/h/y): ";
   cin >> userAnswer;
   
   if( (userAnswer != 'l') && (userAnswer != 'h') ) { // Base case: found number
      cout << "Thank you!" << endl;                   
   }
   else {                                             // Recursive case: split into lower OR upper half
      if (userAnswer == 'l') {                        // Guess in lower half
         GuessNumber(lowVal, midVal);                 // Recursive call
      }
      else {                                          // Guess in upper half
         GuessNumber(midVal + 1, highVal);            // Recursive call
      }
   }
}

It presents the algorithm, and then they give an explanation about how to calculate the mid-value for the recursive call.

Because midVal has already been checked, it need not be part of the new window, so midVal + 1 rather than midVal is used for the window's new low side, or midVal - 1 for the window's new high side. But the midVal - 1 can have the drawback of a non-intuitive base case (i.e., midVal < lowVal, because if the current window is say 4..5, midVal is 4, so the new window would be 4..4-1, or 4..3). rangeSize == 1 is likely more intuitive, and thus the algorithm uses midVal rather than midVal - 1. However, the algorithm uses midVal + 1 when searching higher, due to integer rounding. In particular, for window 99..100, midVal is 99 ((99 + 100) / 2 = 99.5, rounded to 99 due to truncation of the fraction in integer division). So the next window would again be 99..100, and the algorithm would repeat with this window forever. midVal + 1 prevents the problem, and doesn't miss any numbers because midVal was checked and thus need not be part of the window.

Reasoning

I can see why, when the mid-value is used as a lower limit, the function is called as GuessNumber(midVal + 1, highVal). The explanation given with the limits 99 and 100 is pretty clear. However, I don't understand why, when the mid-value is used as the highest limit, the function is called as GuessNumber(lowVal, midVal) instead of GuessNumber(lowVal, midVal - 1).

The algorithm is missing the case when the value being searched is not within the range. However, it seems that they do make the assumption that it is (as a precondition). Therefore, the example they give with 4 and 5 does not make much sense.

Test case: the number being searched is 4

Let's assume that the value being searched is 4.

mid_value := (4+5) / 2 = 9 / 2 = 4.5 = 4 (due truncation) 

When the number is checked, it should return the position of 4, so there would be no error. The call GuessNumber(4, mid_value - 1) would have never been called. This means that the case for midVal < lowVal would have never occurred.

Test case: the number being searched is 4

Now, suppose the value is 5. The same calculation is done. When compared, the algorithm will execute the call GuessNumber(mid_value + 1, 5). This should return the position of 5. Again, GuessNumber(5, mid_value - 1) is not called.

Test case: changing the range

If I try to increase the range, let's say using 4 and 7 as limits, the function would never cause midVal < lowVal if called like GuessNumber(low_value, mid_value - 1). Consider the mid-value of the range between 4 and 7, namely 5 (due truncation). If the number being searched is 5, then the position is immediately returned. However, if the number being searched is 4 and the recursive call is done as GuessNumber(low_value, mid_value - 1) (GuessNumber(4, 5 - 1)), then the new mid-value will be 4 and no midVal < lowVal would occur. The postion of 4 is returned.

Some conclusions

I think it may be a logical error. The only way this could happen is if the number being searched is outside of the range (and specifically below the lower limit), but the algorithm is not testing a case when the number being searched is outside of the range. Again, it seems to be a precondition. Nevertheless, the explanation given caught my attention. They took the time to say that the error midVal < lowVal could happen, and they gave the example of the range 4 and 5.

Other Findings

I looked up the pseudocode in a discrete math book, but they use the case of recursive_binary_search(lowVal, midVal - 1) without worrying about the issue described above. I noticed they do some checking if the value is out of range, though.

procedure binary_search(i, j, x: integer, 1 ≤ i ≤ j ≤ n)
m := ⎣(i + j)/2⎦
if x = am then
    return m
else if (x < am and i < m) then
    return binary_search(i, m-1, x)
else if (x > am and j > m) then
    return binary_search(m + 1, j, x)
else return 0
{output is location of x in a1, a2, ..., an if it appears; otherwise it is 0}

I also saw this implementation in another data structures book. This does not make the precondition that the item being searched is inside the range, but they do check for that, and they still call the recursive function with the limits lower (first in this example) and mid - 1 (loc - 1 in this example).

void recBinarySearch(ArrayType a, int first, int last, ElementType item, bool &found, int &loc) {
    /*---------------------------
      Recursively search sub(list) a[first], ..., a[last] for item using binary search.

      Precondition: Elements of a are in ascending order; item has the same type as the array elements.
      Postcondition: found = true and loc = position of item if the search is successful; otherwise, found is false.
    -----------------------------*/

    if (first > last)
        found = false;
    else
    {
        loc = (first + last) / 2;
        if (item < a[loc])       // First half
            recBinarySearch(a, first, loc - 1, found, loc);
        else if (item > a[loc])  // Second half
            recBinarySearch(a, loc + 1, last, found, loc);
        else
            found = true;
    }
}

Question

I have searched on Google and other StackOverflow questions, but I cannot find something that points me in the right direction (most of the findings explain the overflow issue in the mid-value calculation, which is not the issue here). Is the explanation given in the book about using mid-value instead of mid-value - 1 for the upper limit correct? Is there an example that could demonstrate so, or am I missing something?

Thank you in advance for you time and help!


Solution

  • You're right to be confused by this example. With a range of 4..5, the guess (midVal) would be 4. The only way the line of code GuessNumber(lowVal, midVal-1); would be executed is if the user answered "low" which is:

    1. a lie, or
    2. their number is out of range.

    The example code doesn't account for search values outside the initial input range, which a binary search should do.