Search code examples
algorithmperformancebinary-searchcoding-efficiency

Difference in performance between the following two implementations of binary search


I came across these two implementations of binary search in the book "Competitive Programmer's Handbook" https://cses.fi/book/book.pdf.

Method 1:


int a = 0, b = n-1; 

while (a <= b) { 
    int k = (a+b)/2; 
    if (array[k] == x) { 
        // x found at index k
    }
    if (array[k] > x) 
        b = k-1; 
    else
        a = k+1;
}

Method 2:

int k = 0; 

for (int b = n/2; b >= 1; b /= 2){ 
    while (k+b < n && array[k+b] <= x) 
        k += b;
}
if (array[k] == x){ 
    // x found at index k
}

I guess method 2 is not exactly binary search. I understand that Both method 1 and method 2 have O(log n) complexity. Also the code for method 2 is simpler and therefore might result in fewer bugs.

My questions are:

  • Is there any improvement in performance when method-2 is used?
  • Does method-2 have any other advantage?

Solution

  • For such short code and so few differences, it is impossible to do any prediction. The time performance will depend on how the compiler optimizes, but also on the distribution of the keys in the array (in particular, the probability of a hit rather than a miss).

    I disagree with all comments "against" the second method (even claimed buggy when it is perfectly correct). It is based on a principle that potentially makes it better: there's only one test in the body of the loop.

    Having a comparison for equality (Method 1) gives the false feeling that the algorithm will terminate early when the key is found and make the search faster*. But this is not so true, because for half of the keys the full depth of the decision tree is anyway necessary, and this not counter-balanced by the fact that there are two comparisons instead of one.

    *In fact, you just spare one test on average !


    Only benchmarking can tell you if one of the methods is faster with particular test cases. My bet is that the distributions of the running times overlap a lot. (Not counting that it is virtually impossible to benchmark such a fast algorithm in a way that is representative of its behavior in real context.)


    Last comment: the method 2 is a binary search, while in fact method 1 is ternary !