Search code examples
algorithmdata-structurestime-complexityheapheapsort

What is the purpose of having heap sort?


When we build a heap, the elements in the array get arranged in a particular order (ascending or descending) depending on whether it is max heap or min heap. Then what is the use of heap sort when building a heap itself arranges the elements in sorted order with less time complexity?

    void build_heap (int Arr[ ])
        {
           for(int i = N/2-1 ; i >= 0; i-- )
           {
              down_heapify (Arr, i, N);
           }
        }
    void heap_sort(int Arr[], int N)
    {
        build_heap(Arr);
        
        for(int i = N-1; i >= 1; i--)
        {
            swap(Arr[i], Arr[0]);
            down_heapify(Arr, 0, i+1);
        }
    }

Solution

  • Heap sort summed up

    Heap sort is an algorithm which can be summed up in two steps:

    • Convert the input array into a heap;
    • Convert the heap into a sorted array.

    The heap itself is not a sorted array.

    Let's look at an example:

    [9, 7, 3, 5, 4, 2, 0, 6, 8, 1]   # unsorted array
    
    convert into heap
    [9, 8, 3, 7, 4, 2, 0, 6, 5, 1]   # array representing a max-heap
    
    sort
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]   # sorted array
    

    If you look closely, you'll notice the second array in my example, which represents a heap, isn't quite sorted. The order of the element looks less random than in the original unsorted array; they look almost sorted in decreasing order; but they aren't completely sorted. 3 comes before 7, 0 comes before 6 in the array.

    So what is a heap?

    What is a heap?

    Note that in the previous section, I make a distinction between "a heap" and "an array representing a heap". Let's talk about what is a heap first, and then what is an array representing a heap.

    A max-heap is a binary tree with values on the nodes, which satisfies the two following properties:

    • the value on a child node is always lower than the value on its parent node;
    • the tree is almost-complete, in the sense that all branches of the tree have almost the same length, with a difference of at most 1 between the longest and the shorted branches; in addition, the longest branches must be on the left and the shortest branches must be on the right.

    In the example I gave, the heap constructed is this one:

              9
          /      \
         8        3
       /   \     / \
      7     4   2   0
     / \   / \ / \ / \
    6   5 1
    

    You can check that this binary tree satisfies the two properties of a heap: each child has a lower value than its parent, and all branches have almost the same length, with always 4 or 3 values per branch, and the longest branches being on the left and the shortest branches being on the right.

    What is an array representing a heap?

    Storing binary trees into arrays is usually pretty inconvenient, and binary trees are most generally implemented using pointers, kinda like a linked list. However, the heap is a very special binary tree, and its "almost-complete" property is super useful to implement it as an array.

    All we have to do is read the values row per row, left to right. In the heap above, we have four rows:

    9
    8 3
    7 4 2 0
    6 5 1
    

    We simply store these values in that order in an array:

    [9,  8, 3,  7, 4, 2, 0,  6, 5, 1]
    

    Notice that this is exactly the array after the first step of heap sort at the beginning of my post.

    In this array representation, we can use positions to determine which node is a child of which node: the node at position i has two children, which are at positions 2*i+1 and 2*i+2.

    This array is not a sorted array. But it represents a heap and we can easily use it to produce a sorted array, in n log(n) operations, by extracting the maximum element repeatedly.

    If heap-sort was implemented with an external binary tree, then we could use either a max-heap or a min-heap, and sort the array by repeatedly selecting the maximum element or the minimum element. However, if you try to implement heap-sort in-place, storing the heap as an array inside the array which is being sorted, you'll notice that it's much more convenient to use a max-heap than a min-heap, in order to sort the elements in increasing order by repeatedly selecting the max element and moving it to the end of the array.