Search code examples
c++algorithmmin-heap

How min heap is used here to solve this


I would like to know how min heap is used here to solve the following problem.

What I thought to solve it is to use hashtable and save the counts of the numbers. but I don't know how to use the min heap to contiune solving the problem.

Given a non-empty array of integers, return the k most frequent elements.

For example, Given [1,1,1,2,2,3] and k = 2, return [1,2].

Note: You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int, int> counts;
        priority_queue<int, vector<int>, greater<int>> max_k;
        for(auto i : nums) ++counts[i];
        for(auto & i : counts) {
            max_k.push(i.second);
            // Size of the min heap is maintained at equal to or below k
            while(max_k.size() > k) max_k.pop();
        }
        vector<int> res;
        for(auto & i : counts) {
            if(i.second >= max_k.top()) res.push_back(i.first);
        }
        return res;
    }

Solution

  • The code works like this:

    for(auto i : nums) ++counts[i];  // Use a map to count how many times the
                                     // individual number is present in input
    
    priority_queue<int, vector<int>, greater<int>> max_k;  // Use a priority_queue
                                                           // which have the smallest
                                                           // number at top
    
    for(auto & i : counts) {
        max_k.push(i.second);                 // Put the number of times each number occurred
                                              // into the priority_queue
    
        while(max_k.size() > k) max_k.pop();  // If the queue contains more than
                                              // k elements remove the smallest
                                              // value. This is done because
                                              // you only need to track the k
                                              // most frequent numbers
    
    vector<int> res;                                         // Find the input numbers
    for(auto & i : counts) {                                 // which is among the most
        if(i.second >= max_k.top()) res.push_back(i.first);  // frequent numbers
                                                             // by comparing their
                                                             // count to the lowest of
                                                             // the k most frequent.
                                                             // Return numbers whose 
                                                             // frequencies are among
                                                             // the top k
    

    EDIT

    As pointed out by @SergeyTachenov here How min heap is used here to solve this, your result vector may return more than k elements. Maybe you can fix that by doing:

    for(auto & i : counts) {
        if(i.second >= max_k.top()) res.push_back(i.first);
        if (res.size() == k) break; // Stop when k numbers are found
    }
    

    Another small comment

    You don't really need a while-statement here:

    while(max_k.size() > k) max_k.pop();
    

    an if-statement would do.