Search code examples
big-omergesortheapq

Why is the time complexity of heapq.merge higher than that of heapq.heapify?


Merging k sorted lists containing a total of n elements using heapq.merge is supposed to have a complexity of O(n * logk). This is not directly listed in the documentation but can be inferred from the Wikipedia article mentioning the direct k-way merge. It also seems fairly intuitive - you create a heap of the k top elements and then pop the root of that heap and push onto the heap the next element from the same list - and repeat this till you get the heap (and the lists feeding to it) empty.

What bugs me is that the complexity of this algorithm is higher than that of heapq.heapify if the latter is applied on the same number of elements n supplied in a single unsorted list. The latter complexity is known to be O(n)

This does not make sense - it should be the other way round. It should be more difficult to heapify n unordered elements than to heapify the same elements as sorted in k lists.

What am I missing here?


Solution

  • Direct k-way merge produces a sorted array from your input of sorted arrays.

    Creating a heap from all your n elements in unsorted order produces, well, a heap.

    A heap is not a sorted list; in fact you need to do a lot of work to produce a sorted list from a heap, as discussed in articles about heapsort, which is an O(n log n) sorting algorithm. So creating the heap may be O(n), but the output is different to that of k-way merge. In this context, you may view it as weaker than the already sorted array. Thus, it makes sense that this time complexity is smaller than that of k-way merge.