Search code examples
c++searchheap

Time complexity of using heaps to find Kth largest element


I have some different implementations of the code for finding the Kth largest element in an unsorted array. The three implementations I use all use either min/max heap, but I am having trouble figuring out the runtime complexity for one of them.

Implementation 1:

int findKthLargest(vector<int> vec, int k)
{
    // build min-heap
    make_heap(vec.begin(), vec.end(), greater<int>());

    for (int i = 0; i < k - 1; i++) {
        vec.pop_back();
    }

    return vec.back();
}

Implementation 2:

int findKthLargest(vector<int> vec, int k)
{
    // build max-heap
    make_heap(vec.begin(), vec.end());

    for (int i = 0; i < k - 1; i++) {
        // move max. elem to back (from front)
        pop_heap(vec.begin(), vec.end()); 
        vec.pop_back();
    }

    return vec.front();
}

Implementation 3:

int findKthLargest(vector<int> vec, int k)
{
    // max-heap prio. q
    priority_queue<int> pq(vec.begin(), vec.end());

    for (int i = 0; i < k - 1; i++) {
        pq.pop();
    }

    return pq.top();
}

From my reading, I am under the assumption that the runtime for the SECOND one is O(n) + O(klogn) = O(n + klogn). This is because building the max-heap is done in O(n) and popping it will take O(logn)*k if we do so 'k' times.

However, here is where I am getting confused. For the FIRST one, with a min-heap, I assume building the heap is O(n). Since it is a min-heap, larger elements are in the back. Then, popping the back element 'k' times will cost k*O(1) = O(k). Hence, the complexity is O(n + k).

And similarly, for the third one, I assume the complexity is also O(n + klogn) with the same reasoning I had for the max-heap.

But, some sources still say that this problem cannot be done faster than O(n + klogn) with heaps/pqs! In my FIRST example, I think this complexity is O(n + k), however. Correct me if I'm wrong. Need help thx.


Solution

  • Properly implemented, getting the kth largest element from a min-heap is O((n-k) * log(n)). Getting the kth largest element from a max-heap is O(k * log(n)).

    Your first implementation is not at all correct. For example, if you wanted to get the largest element from the heap (k == 1), the loop body would never be executed. Your code assumes that the last element in the vector is the largest element on the heap. That is incorrect. For example, consider the heap:

     1
    3 2
    

    That is a perfectly valid heap, which would be represented by the vector [1,3,2]. Your first implementation would not work to get the 1st or 2nd largest element from that heap.

    The second solution looks like it would work.

    Your first two solutions end up removing items from vec. Is that what you intended?

    The third solution is correct. It takes O(n) to build the heap, and O((k - 1) log n) to remove the (k-1) largest items. And then O(1) to access the largest remaining item.

    There is another way to do it, that is potentially faster in practice. The idea is:

    build a min-heap of size k from the first k elements in vec
    for each following element
        if the element is larger than the smallest element on the heap
            remove the smallest element from the heap
            add the new element to the heap
    return element at the top of the heap
    

    This is O(k) to build the initial heap. Then it's O((n-k) log k) in the worst case for the remaining items. The worst case occurs when the initial vector is in ascending order. That doesn't happen very often. In practice, a small percentage of items are added to the heap, so you don't have to do all those removals and insertions.

    Some heap implementations have a heap_replace method that combines the two steps of removing the top element and adding the new element. That reduces the complexity by a constant factor. (i.e. rather than an O(log k) removal followed by an O(log k) insertion, you get an constant time replacement of the top element, followed by an O(log k) sifting it down the heap).