When we build a heap, the elements in the array get arranged in a particular order (ascending or descending) depending on whether it is max heap or min heap. Then what is the use of heap sort when building a heap itself arranges the elements in sorted order with less time complexity?
void build_heap (int Arr[ ])
{
for(int i = N/2-1 ; i >= 0; i-- )
{
down_heapify (Arr, i, N);
}
}
void heap_sort(int Arr[], int N)
{
build_heap(Arr);
for(int i = N-1; i >= 1; i--)
{
swap(Arr[i], Arr[0]);
down_heapify(Arr, 0, i+1);
}
}
Heap sort is an algorithm which can be summed up in two steps:
The heap itself is not a sorted array.
Let's look at an example:
[9, 7, 3, 5, 4, 2, 0, 6, 8, 1] # unsorted array
convert into heap
[9, 8, 3, 7, 4, 2, 0, 6, 5, 1] # array representing a max-heap
sort
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # sorted array
If you look closely, you'll notice the second array in my example, which represents a heap, isn't quite sorted. The order of the element looks less random than in the original unsorted array; they look almost sorted in decreasing order; but they aren't completely sorted. 3 comes before 7, 0 comes before 6 in the array.
So what is a heap?
Note that in the previous section, I make a distinction between "a heap" and "an array representing a heap". Let's talk about what is a heap first, and then what is an array representing a heap.
A max-heap is a binary tree with values on the nodes, which satisfies the two following properties:
In the example I gave, the heap constructed is this one:
9
/ \
8 3
/ \ / \
7 4 2 0
/ \ / \ / \ / \
6 5 1
You can check that this binary tree satisfies the two properties of a heap: each child has a lower value than its parent, and all branches have almost the same length, with always 4 or 3 values per branch, and the longest branches being on the left and the shortest branches being on the right.
Storing binary trees into arrays is usually pretty inconvenient, and binary trees are most generally implemented using pointers, kinda like a linked list. However, the heap is a very special binary tree, and its "almost-complete" property is super useful to implement it as an array.
All we have to do is read the values row per row, left to right. In the heap above, we have four rows:
9
8 3
7 4 2 0
6 5 1
We simply store these values in that order in an array:
[9, 8, 3, 7, 4, 2, 0, 6, 5, 1]
Notice that this is exactly the array after the first step of heap sort at the beginning of my post.
In this array representation, we can use positions to determine which node is a child of which node: the node at position i
has two children, which are at positions 2*i+1
and 2*i+2
.
This array is not a sorted array. But it represents a heap and we can easily use it to produce a sorted array, in n log(n) operations, by extracting the maximum element repeatedly.
If heap-sort was implemented with an external binary tree, then we could use either a max-heap or a min-heap, and sort the array by repeatedly selecting the maximum element or the minimum element. However, if you try to implement heap-sort in-place, storing the heap as an array inside the array which is being sorted, you'll notice that it's much more convenient to use a max-heap than a min-heap, in order to sort the elements in increasing order by repeatedly selecting the max element and moving it to the end of the array.