Search code examples
sortingheapbinary-treebinary-search-treeheapsort

Why do we sort via Heaps instead of Binary Search Trees?


A heap can be constructed from a list in O(n logn) time, because inserting an element into a heap takes O(logn) time and there are n elements.

Similarly, a binary search tree can be constructed from a list in O(n logn) time, because inserting an element into a BST takes on average logn time and there are n elements.

Traversing a heap from min-to-max takes O(n logn) time (because we have to pop n elements, and each pop requires an O(logn) sink operation). Traversing a BST from min-to-max takes O(n) time (literally just inorder traversal).

So, it appears to me that constructing both structures takes equal time, but BSTs are faster to iterate over. So, why do we use "Heapsort" instead of "BSTsort"?

Edit: Thank you to Tobias and lrlreon for your answers! In summary, below are the points why we use heaps instead of BSTs for sorting.

  • Construction of a heap can actually be done in O(n) time, not O(nlogn) time. This makes heap construction faster than BST construction.
  • Additionally, arrays can be easily transformed into heaps in-place, because heaps are always complete binary trees. BSTs can't be easily implemented as an array, since BSTs are not guaranteed to be complete binary trees. This means that BSTs require additional O(n) space allocation to sort, while Heaps require only O(1).
  • All operations on heaps are guaranteed to be O(logn) time. BSTs, unless balanced, may have O(n) operations. Heaps are dramatically simpler to implement than Balanced BSTs are.
  • If you need to modify a value after creating the heap, all you need to do is apply the sink or swim operations. Modifying a value in a BST is much more conceptually difficult.

Solution

  • If the sorting method consists of storing the elements in a data structure and after extracting in a sorted way, then, although both approaches (heap and bst) have the same asymptotic complexity O(n log n), the heap tends to be faster. The reason is the heap always is a perfectly balanced tree and its operations always are O(log n), in a determistic way, not on average. With bst's, depending on the approah for balancing, insertion and deletion tend to take more time than the heap, no matter which balancing approach is used. In addition, a heap is usually implemented with an array storing the level traversal of the tree, without the need of storing any kind of pointers. Thus, if you know the number of elements, which usually is the case, the extra storage required for a heap is less than the used for a bst.

    In the case of sorting an array, there is a very important reason which it would rather be preferable a heap than a bst: you can use the same array for storing the heap; no need to use additional memory.