Search code examples
algorithmsortingheapsort

Questions about William's Heapsort


If I'm not mistaken, the current implementations of heapsort use a modified version called treesort. I'm trying to understand the original algorithm as written by Williams.

OUTHEAP is called by SWOPHEAP who is never called. How does the algorithm work then? Is the replacement of a min heap by a max heap the only difference between this algorithm and the pseudocode found in every book?


Solution

  • You express multiple doubts/questions, so let me address them one by one:

    If I'm not mistaken, the current implementations of heapsort use a modified version called treesort.

    This is what you would conclude if you read the 1963-1964 papers by Williams, Floyd. Floyd used the name TREESORT for an improvement on the novelty found by Williams, who called the data structure a heap, and the sorting algorithm heap sort. The terminology has since evolved.

    Now we would consider both heapsort. Floyd essentially improved William's SETHEAP function so it would run more efficiently, but called his function TREESORT 3. When today we refer to tree sort, we mean something different: that is a sorting algorithm that uses a binary search tree as data structure, which has different properties than a binary heap data structure.

    OUTHEAP is called by SWOPHEAP who is never called. How does the algorithm work then?

    It was not William's purpose to use SWOPHEAP for his novel heap sort algorithm. He first introduced the concept of a heap (not yet the sorting algorithm), and provided a few utility functions to work with such a heap: SETHEAP, INHEAP, OUTHEAP and SWOPHEAP. Note that the wikipedia article on the heap data structure discusses the same operations and calls them construction (SETHEAP), insertion (INHEAP), extraction (OUTHEAP) and replacement (SWOPHEAP).

    After having introduced the concept of a heap and these functions to work with them, Williams continued:

    The procedures may be used for replacement-selection sorting, for sorting the elements of an array, or for choosing the current minimum of any set of items to which new items are added from time to time.

    He didn't claim that for all these purposes, all these utility functions had to be used. For the last example of application you would use SWOPHEAP and as it turns out, you don't need it for implementing heap sort.

    Is the replacement of a min heap by a max heap the only difference between this algorithm and the pseudocode found in every book?

    Certainly not "every book" introduces heaps in the same way. Some introduce the concept with a min heap implementation, while others will use a max heap for that purpose.

    A min heap may be a bit more natural to start with, since a heap can be seen as a kind of "partially sorted" collection, and for sorting, we also consider the default to be from least to greater (by convention).

    But realise that in order to use heap sort for sorting in that non-descending order, you actually need a max heap! If in heap sort you would use a min heap, you would end up with a list that is sorted from great to small.