I'm considering the following problem,
Given a sorted array of size n containing integers without duplicates, can we do better than a normal binary search using the property that
The idea is that when you meet a value, you add a test to see if the value you seek is adjacent to current value (+ or -1), if so, the search interval instead of being halved is reduced to a single point, the index next to current mid.
For instance, suppose you have an array tab[i]=i for all index and with all values from 0 to 99. We look for 51, the first mid is 50, so a normal binary search is in for a worst case scenario in 7 hits (log2 (100)). With the additional test, we test 50, and reduce the search interval to the neighbor of 50, so finish in two steps (but with an added test).
This is one example but is not representative of my data set, another example could be {0,13223,13225,42341,42342} or any set of values sorted without repeats. Just to give some context, these arrays I'm manipulating are the keys (non empty indices) in a Sparse array implementation.
In the worst case, it seems that we conclude when the interval is size 3 instead of 2, so log2(n)-1 tests.
In code this would give something like (Java used here) invoke with 0 as lo and length of array-1 as hi to search the whole array:
// This is Arrays.binarySearch(), but doesn't do any argument validation.
static int binarySearchGT(int[] array, int value, int lo, int hi) {
while (lo <= hi) {
final int mid = (lo + hi) >>> 1;
final int midVal = array[mid];
if (midVal < value) {
lo = mid + 1;
} else if (midVal > value) {
hi = mid - 1;
} else {
return mid; // value found
}
}
return ~lo; // value not present
}
becomes
static int binarySearch(int[] array, int value, int lo, int hi) {
while (lo <= hi) {
final int mid = (lo + hi) >>> 1;
final int midVal = array[mid];
if (midVal < value) {
if (hi != mid && midVal == value -1) {
hi = mid + 1;
}
lo = mid + 1;
} else if (midVal > value) {
if (lo != mid && midVal == value + 1) {
lo = mid - 1;
}
hi = mid - 1;
} else {
return mid; // value found
}
}
return ~lo; // value not present
}
Is my reasoning correct in thinking that this should (always) be better than the normal binary search in this particular discrete/non repeating case of inputs ? I do see that I have an additional branch and two boolean tests including an addition, but still with large inputs, can you exhibit a case where this strategy is clearly worst ?
Does anyone know of a reference to some kind of similar idea in the literature ?
[Edited to better explain not all elements are present]
As you cannot guarantee that there will be a value in the array adjacent to what is searched for, in the worst case there isn't, meaning the cost is the same as a binary search. Worse, actually, because you have added an extra test for every element you inspect.