Search code examples
algorithmheapheapsortminmax-heap

Heapsort: Build max heap procedure explanation


i'm trying to understand the code to sort an array using Heap Sort (ref - https://www.sanfoundry.com/java-program-implement-heap-sort/)

"maxheap" function uses this calculation to get the left & right children of the Node (the same is used in multiple different examples/sites)

int left = 2*i;
int right = 2*i + 1;

What is the logic/reasoning for the above ?

Additionally, in the heapify function, maxheap fn is called with i = N/2 -> 0
(instead of i = 0 to N -1, for eg.) Any ideas on why this is done ?

public static void heapify(int arr[])
{
     N = arr.length-1;
     for (int i = N/2; i >= 0; i--)
         maxheap(arr, i);        
}

Solution

  • What is the logic/reasoning for the above ?

    Heap is a binary tree and in a 1-based index array as you can see below to get the the left child of node i you need the index 2*i and for the right child you get the index 2*i + 1.

    Array:
    [1, 2, 3, 4, 5, 6, 7]
    
    Binary tree:
    
             1
          /     \
       2           3
     /   \       /   \
    4     5     6     7
    

    Additionally, in the heapify function, maxheap fn is called with i = N/2 -> 0 (instead of i = 0 to N -1, for eg.) Any ideas on why this is done ?

    maxheap or max_heapify function is a procedure that gets an array and an index i as input to maintain the max heap property. The assumption for max_heapify is that the left and right subtrees of node i are max heaps but the node i might violate the max heap property (it might be smaller than its children).
    It means that if we call max_heapify on all the array elements all the nodes will maintain the max heap property and we should get a max heap. However if we start from the beginning of the array (root of the tree) we can not assume that left and right subtrees are already max heaps, therefore we need to start at the end (leaves of the tree) and go to the beginning. Additionally leaves don't have children so they are trivially already max heaps and we don't need to call max_heapify on them. In a heap with n elements the nodes in the subarray [floor(n/2)+1..n] are the leaves, so we can loop from floor(n/2) to 1 and call max_heapify only on internal nodes.

    For 0-based array:

    left(i):        2 * i + 1
    right(i):       2 * i + 2
    leaves:         [floor(n/2)..n-1]
    internal nodes: [0..floor(n/2)-1]