I've seen a few implementations of Interpolation search in Python that uses
if key == arr[low]:
return low
After the while-loop, like this:
def interpolation_search(arr, key):
low = 0
high = len(arr) - 1
while arr[high] != arr[low] and arr[low] <= key <= arr[high]:
mid = low + ((key - arr[low]) * (high - low) // (arr[high] - arr[low]))
if key == arr[mid]:
return mid
elif key < arr[mid]:
high = mid - 1
else:
low = mid + 1
if key == arr[low]:
return low
return -1
What is that doing? I've run a bunch of tests with various kinds of lists (even distribution, uneven, etc) and searching for every item in the list but I haven't been able to see a difference regardless of if that if-statement is there or not.
An example of the algorithm with the if-statement: https://www.techiedelight.com/interpolation-search/
And without the if-statement: https://www.baeldung.com/java-interpolation-search
I've run the code on arr = range(200) as well as a sorted array with length 200 containing random integers from 0 - 1000, with a few duplicates. The if-statement doesn't change the outcome.
The if-statement below the while loop may be there to catch the case where the input array consists of only the same values, like [9, 9, 9, 9]
; or even of one value, [9]
. In that case, the while loop is never entered, and the if statement becomes necessary:
interpolation_search([9, 9, 9, 9], 9)
and
interpolation_search([9], 9)
will both enter the if-statement body.
Side notes:
When the value is not present, an exception should be raised, instead of a string returned.
The return value of a string when the value is not found is a bad choice, in my opinion, since there is still a single value returned, and it appeared the function ran successfully. But when later on, that value is used as an index into an array, there'll be an exception that has its roots in this function, but the exception won't show that.
in Python, you would just do arr.index(key)
, which replaces this whole function (and may be more optimised). Arguably, what you have here, is a C-style function, which happens to be coded in Python, but really shouldn't be used in Python. Also, this ignores NumPy, which would be more suitable for large arrays (the variable name arr
is more indicative of NumPy arrays though, instead of something like lst
or my_list
, the latter which tends to point to Python lists).