Search code examples
pythonalgorithmperformancesearchbinary-search

Binary Search with results returned as Indices


I'm working on an implementation of Binary Search in Python as part of a course (Algorithmic Toolbox on Coursera).

The challenge is to create an implementation of Binary Search that returns the index of a query (an integer) in an array. So for example, if I call Binary Search with an array of [1,5,7] and my query is 5, Binary Search should return 1. If the target is not in the array, then it should return -1. For example, I give it [1, 67, 88, 200] and my target is 999, then it should return -1.

This problem operates on the assumption that all test cases are feeding it arrays that are sorted and do not have duplicates.

My current implementation uses a helper function to actually conduct the Binary Search. This helper returns either the target value itself if it can be located, or -1 if it can't be found. The main function calling the helper takes the result, and if it isn't -1, it looks up the original index in a dictionary I create at the start of the main function.

In my own private test cases, the code functions with correct results, however upon submission, I am told that the code simply takes too long to run to be considered acceptable.

So my request is this: please review my code, and please help me figure out how to change it to run more efficiently. My code is posted below:

def single_binary_search(keys, target): 
    '''
    Takes an array of integers, keys, and searches for a value, target, in the array.  If the target is found to be 
    within the array, it returns the target value.  Else, it returns -1 if the array is considered invalid or if the
    target value does not exist within the keys array.  
    
    Input: 
        keys: an array of integers sorted in increasing order 
        target: an integer we wish to ascertain is within keys  
    Returns:
        the target value if it is located, or -1 if it is not or if the array is invalid 
    '''
    start = 0 # define start index
    end = len(keys)-1 # define end index
    if end < start: # check if array is invalid or if target is not in keys.  If so, return -1. 
        return -1 
    mp = start + (end-start)//2 # calculate the midpoint index
    if target == keys[mp]: # check to see if the target is at the midpoint.  
        return keys[mp] # return the target value if located in keys
    elif target < keys[mp]: # if target is less than mp, recursively call binary search on the lower array 
        return single_binary_search(keys[:mp], target)
    else:  # if target is greater than the mp, recursively call binary search on the upper array
        return single_binary_search(keys[mp+1:], target)

def binary_search(keys, query): 
    keys_dictionary = {k: v for v, k in enumerate(keys)} # Create a dictionary of keys to track the indices 
    result = single_binary_search(keys, query) # search for the individual query in keys and store the result 
    if result != -1: # if we found the target, use the keys dictionary to look up the index and return it 
        return keys_dictionary[result]
    else: # if the target was not found or the keys array was invalid, return -1
        return result

An earlier implementation tried to add two new arguments, start = 0, end = None, to the single_binary_search() function, but I scrapped it because I kept running into endless recursions that I knew how to fix only by tacking on specific if conditions to account for exceptions.


Solution

  • Here is a more conventional binary search (see also Wikipedia) that will work better:

    def binary_search(keys, query): 
        L, R = 0, len(keys) - 1
        while L <= R:
            M = L + (R - L) // 2
            if keys[M] == query:
                return M
            elif query < keys[M]:
                R = M - 1
            else:
                L = M + 1
        return -1
    
    print("keys:", list(range(0,20,2)))
    print(14, binary_search(list(range(0,20,2)), 14))
    print(15, binary_search(list(range(0,20,2)), 15))
    print(-1, binary_search(list(range(0,20,2)), -1))
    print(20, binary_search(list(range(0,20,2)), 20))
    print(0, binary_search(list(range(0,20,2)), 0))
    print(18, binary_search(list(range(0,20,2)), 18))
    

    Test case output:

    keys: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
    14 7
    15 -1
    -1 -1
    20 -1
    0 0
    18 9
    

    Some issues in your implementation are that slicing is expensive and best avoided, and also that your dictionary mechanic seems unnecessary, since binary search operates on the indexes of the array being searched, and therefore you shouldn't need to do any extra work to get the index of the match (if any).

    UPDATE: Here is a recursive approach that will also work:

    def binary_search(keys, query, L = 0, R = None): 
        R = R if R is not None else len(keys) - 1
        M = L + (R - L) // 2
        if L > R:
            return -1
        elif keys[M] == query:
            return M
        elif query < keys[M]:
            R = M - 1
        else:
            L = M + 1
        return binary_search(keys, query, L, R)