Search code examples
pythonalgorithmsortingheapheapq

Understanding heapq sorting algorithm


I am reading the book, "Python from Novice to Expert" by Magnus Lie Hetland (Third Edition) and came across Heaps. There he discusses the sorting order of a heap list as "the order of the elements is important (even though it may look a bit haphazard.."

According to him the heap algorithm has 2 rules of ordering the elements:

1) Element at i is greater than element at position i//2

If one is not made then:

2) Element at position i is lower than elements at positions 2*i and 2*i+1

I ran a code checking these rules to see if they work all the time,

from heapq import *
from random import shuffle

data = list(range(10))
heap = []
shuffle(data)
for i in data:
    heappush(heap, i)
print(heap)
temp = False

#From p.240 
#The order of the elements isn’t as arbitrary as it seems. They aren’t in 
#strictly sorted order, but there is one
#guarantee made: the element at position i is always greater than the one 
#in position i // 2 (or, conversely,
#it’s smaller than the elements at positions 2 * i and 2 * i + 1). This is 
#the basis for the underlying heap
#algorithm. This is called the heap property.

for i in heap:
    print('___________')
    if heap[i] > heap[i//2]:
        print('First if: {}>{}'.format(heap[i],heap[i//2]))
        temp = True
        try:    
            if heap[i] < heap[2*i]:
                print('Second if: {}<{}'.format(heap[i],heap[i*2]))
                temp = True
        except IndexError:
                pass
        try:
            if heap[i] < heap[2*i+1]:
                print('Third if: {}<{}'.format(heap[i],heap[i*2+1]))
                temp = True
        except IndexError:
                pass
    else:
        try:    
            if heap[i] < heap[2*i]:
                print('Second if: {}<{}'.format(heap[i],heap[i*2]))
                temp = True
        except IndexError:
                pass
        try:
            if heap[i] < heap[2*i+1]:
                print('Third if: {}<{}'.format(heap[i],heap[i*2+1]))
                temp = True
        except IndexError:
                pass
    if not temp:
        print('No requirement was made')
    temp = False
    print('___________')

As expected there were inputs that achieved the goal and some not, such as:

[0, 1, 2, 3, 5, 8, 7, 9, 4, 6]
[0, 3, 1, 5, 4, 6, 2, 7, 8, 9]

My question is are there more rules for sorting when none of these rules apply?


Solution

  • As mentioned in the comments, the rule you had is stated in the framework of arrays with 1-based indices. Python lists are 0-based, and thus

    if a child is at heap[i], in Python heap the parent is at heap[(i - 1) // 2], not at heap[i // 2]. Conversely, if a parent is at heap[j], then its children are at heap[j * 2 + 1] and heap[j * 2 + 2]

    This is easy to see if you actually take the time to draw the heap:

       Example 1          Example 2         Python Index      1-based Index
           0                  0                  0                  1
       1       2          3       1          1       2          2       3
     3   5   8   7      5   4   6   2      3   4   5   6      4   5   6   7
    9 4 6              7 8 9              7 8 9              8 9 A