In this code, there is a binary-search algorithm. To signal that the item is not found, a negative number is returned, which in the code is done with return ~mid
(see line 92).
public int Search(string name)
{
int low = 0;
int high = _sortedPositions.Count - 1;
int mid = 0;
while (low <= high)
{
mid = (low + high) >> 1;
_stream.Position = _sortedPositions[mid];
var curName = _br.ReadString();
var comp = string.Compare(name, curName);
if (comp == 0)
return mid;
if (comp < 0)
high = mid - 1;
else
low = mid + 1;
}
return ~mid;
}
My question is: why is it done like this? Is there any advantage over returning a fixed negative int, like return -1
?
Yes, there is a benefit (if the function returns the correct value, which it does not, see more below).
The negative number indicates that the key that was searched for was not found in the collection, just as returning -1 would.
However, what you gain, compared to simply returning -1, is that you can flip the bits of this negative value once again to get back the index where the value ought to be if it were to follow the sort order.
You could use it like this:
int index = Search("Some name");
if (index < 0)
// insert at position (~index)
Now, this particular search function has a flaw in which it will incorrectly report the wrong index in a boundary case where it has found the last item to compare to, but the name it searches for should be after this item.
As an example, try adding the following two strings in the positions reported by the Search function: A
and then Z
.
It will add them in the wrong order, Z
before A
.
The reason for this is that it found the single item A
that it should compare against at index 0, but Z
should be added after index 0, not at index 0.
However, the fix for this bug is simple:
mid
outside of the looplow
or high
This way the final mid
value will be correct. Here's the final function:
public int Search(string name)
{
int low = 0;
int high = _sortedPositions.Count - 1;
int mid = low + (high - low) / 2; // note #2
while (low <= high)
{
var curName = _sortedPositions[mid]; // note #1
var comp = string.Compare(name, curName);
if (comp == 0)
return mid;
if (comp < 0)
high = mid - 1;
else
low = mid + 1;
mid = low + (high - low) / 2; // note #2
}
return ~mid;
}
(note #1: I switched sortedPositions
to a simple List<string>
while I was spelunking with your code to make it run. Obviously this will be different for your stream based example.)
(note #2: The way I calculate the mid
value, by calculating the difference and dividing by 2, is the suggested way to do this. If the collection has so many values that low + high
will overflow, the suggested way will not.)
A simpler fix, employed by many implementations, is that at the point where you have no more items to look at, low == mid
(with the above change), so a different fix for the bug would be to return ~low
instead of ~mid
, and then minimal changes have to be performed:
public int Search(string name)
{
int low = 0;
int high = _sortedPositions.Count - 1;
int mid;
while (low <= high)
{
mid = low + (high - low) / 2; // note #2
var curName = _sortedPositions[mid]; // note #1
var comp = string.Compare(name, curName);
if (comp == 0)
return mid;
if (comp < 0)
high = mid - 1;
else
low = mid + 1;
}
return ~low;
}