arrays algorithm time-complexity binary-search

Binary search performing worse than linear

I'm practicing rudimentary programming skills on leetcode.com (I highly recommend it. It's great) and I ran into an interesting result.

I'm trying to find the first fixed point of a given array A with strictly ascending values.

I did this in two ways. First, a binary search / linear time hybrid

    start, end = 0, len(A) - 1
    while(start <= end):
        middle = (start + end) / 2
        if A[middle] == middle:
            for i in range(start, middle + 1):
                if A[i] == i:
                    return i
        elif A[middle] > middle:
            end = middle - 1
        elif A[middle] < middle:
            start = middle + 1
    return -1

This is simple binary search until I find a match, and then I iterate over all potential first fixed points, one after the other, until I find the first match. This is where the linear part comes in. For instance, suppose len(A) = 1001 and A[500] = 500 is the only fixed point, then I find it in one iteration of the binary search, but then I have to go from index 0 to 500 one after the other, looking for the first match. That's n/2 or in other words O(n)

The second solution is purely binary search

    start, end = 0, len(A) - 1
    fixed_point = -1
    while(start <= end):
        middle = (start + end) / 2
        if A[middle] == middle:
            fixed_point = middle
            end = middle - 1
        elif A[middle] > middle:
            end = middle - 1
        elif A[middle] < middle:
            start = middle + 1
    return fixed_point

I expected it to be better, but it's actually worse.

The first solution beats 97% of all submissions when it comes to runtime, and has a runtime of 40 ms.

The second solution only beats 72% of all submissions and has a runtime of 48 ms.

I'm curious why this happens. Is the second solution really the better one?

Solution

At if A[middle] == middle: the position of "any index of the fixed point is found". The first version more efficiently finds the start of the midpoint run, assuming that it occurs 'close to' (fsvo) the start of the linear scan: the cost of each binary loop is 'more expensive' (larger C), regardless of number of loops needed (O complexity). For specific degenerate input, it should be possible to construct cases where the pure binary search is better: eg. size of n and distribution of input.

For example, the degenerate case of [0, 0, 0 ..., midpoint=1, 1, 1 ...] would be be O(n) in the first version as the linear loop would be over start=0..midpoint=n/2¹. Using this degenerate data, and a large enough value of n, the plain binary search will dominate as it's bounded better. However, assuming that it is 'well distributed random data', then the approach with the linear probe could hone in on a very small linear scan set (smaller C for final loops).

This is similar to why a linear scan is faster for small arrays, and why a merge-sort or quick-sort (eg.) may switch to an insertion sort for near-leaf sorting. Big-O describes the behavior as n -> Inifinity, without consideration of the constant costs.

¹The scan could also be midpoint=n/2..start=0.. in that case the degenerate case shown is the ideal case and corresponding degenerate case would be [0, 1, 1, 1 ...].