Search code examples
pythonalgorithmsortingheapsort

How do I improve my Python code about heap sort?


I tried to write a heap sort program myself for Leetcode 217. Contains Duplicate as below rather than using Python built-in sort method. Leetcode should accept heap sort method, but for some reasons I don't know, although my heap sort program works well, I still got the runtime rejection from Leetcode. Can anyone help?

Solved, below code is re-edited using Floyd algorithm to initialize heap and has passed Leetcode

def heapsort(nums):

    def swap(i, j):

        nums[i], nums[j] = nums[j], nums[i]


    def sift(start, size):

        l = (start << 1) + 1 # Do not forget () for <<
        r = l + 1
        largest = start
        if l <= size - 1 and nums[start] < nums[l]:
            largest = l
        if r <= size - 1 and nums[largest] < nums[r]:
            largest = r   
        if largest != start:
            swap(start, largest)
            sift(largest, size)


    size = len(nums)

    # Initialize heap (Floyd Algorithm)
    end = (size >> 1) - 1
    while end >= 0:
        sift(end, size)
        end -= 1
    swap(0, size - 1)
    size -= 1

    # Heapify recursively
    while size > 1:
        sift(0, size)
        swap(0, size - 1)
        size -= 1

Solution

  • Your code does way too much. You're rebuilding the entire heap with every item that you remove. So what should be an O(n log n) algorithm is instead O(n^2).

    Essentially, your code is doing this:

    while array is not empty
        rearrange array into a heap
        extract the smallest item
    

    Rearranging the heap takes, at best, O(n) time. And extracting the smallest takes O(log n). So your algorithm is O(n^2 + n log n).

    Actually, your method of building the heap from the bottom up is O(n log n) in itself. So your heap sort algorithm is actually O((n+1)*(n log n)). In any case, it's a highly sub-optimal algorithm.

    The idea behind heap sort is that you arrange the array into a heap one time. That is an O(n) operation. The algorithm is pretty simple:

    for i = heap.length/2 downto 1
        siftDown(i)
    

    This is called Floyd's algorithm, after its inventor.

    Note that we start in the middle of the array and sift down. The idea is that the last n/2 items are leaf nodes, so they can't sift down, anyway. By starting at n/2 and working backwards, we can heapify the entire array in O(n) time.

    After the array is arranged into a heap, we do the following:

    while heap is not empty
        output the item at heap[0]
        move the item at the end of the heap to heap[0]
        reduce the count of items by 1
        siftDown(0)
    

    The item at heap[0] is the smallest item remaining in the heap, so we output it. Then, there's no need to rebuild the entire heap. All you have to do is take the last item in the heap, place it at the top, and sift it down into position. The rest of the heap remains valid.

    Making those changes should reduce the running time, although I don't know if that's going to make your code acceptable. There is another way to check for duplicates. It requires O(n) extra space, but it's faster than sorting.

    The idea is to create a hash table and then go through the array, checking if the item is in the hash table. If not, add it. If it's already in the table, then it's a duplicate. As Harold pointed out, Python has a set type that makes this kind of thing easy to do.