Search code examples
c++heapmedian

C++ Implementing Heap Median Function


Following the answer found here, https://stackoverflow.com/a/10931091/1311773, I am attempting to implement two heaps so I can calculate a running median.

I'm unfamiliar with heaps, and am not sure where to begin implementing this function described here. http://programmingpraxis.com/2012/05/29/streaming-median/

My goal is to create a small test program that calculates running medians efficiently, so as the list grows the median doesn't need to be recalculated from scratch. Using two heaps, I should be able to do it, I'm just shaky on how to start implementing it.

Any advice on this would be appreciated.


Solution

  • The std::priority_queue template provides all the properties of a heap. Constant time maximum or minimum extraction (depending on how the items are compared), and logarithmic time insertion. It can be found in the <queue> header file.

    By default, the priority_queue is a max-heap.

    int numbers[11] = { 0, 9, 3, 4, 8, 12, 2, 11, 10, 1, 5 };
    std::priority_queue<int> myheap(numbers, numbers + 11);
    std::cout << "biggest " << myheap.top() << std::endl; // 12
    myheap.pop();
    std::cout << "biggest " << myheap.top() << std::endl; // 11
    myheap.push(6);
    std::cout << "biggest " << myheap.top() << std::endl; // 11
    myheap.push(13);
    std::cout << "biggest " << myheap.top() << std::endl; // 13
    

    Here is an example of how to create a min-heap:

    std::priority_queue<
        int,
        std::priority_queue<int>::container_type,
        std::greater<int>
    >;
    

    To implement the streaming median algorithm, the approach is similar to this:

    • create a max-heap for items that are smaller than the median
    • create a min-heap for items that are greater than the median
    • when pushing new items into the heap
      • decide which heap it should be pushed into, and push it there
      • if one of the heaps' size is more than 1 greater than the other heap, then pop the bigger one, and put that element into the smaller one

    Then, the median is the either the top of the larger heap, or the average of the tops of both heaps.

    If you feel you need to manage the heap manually, C++ provides algorithms that allow you to do so on your own random access data structure.

    • std::make_heap -- heapify a region specified by iterator endpoints
    • std::push_heap -- assumes the first N-1 elements are already a heap, and incoporates the N'th element into the heap
    • std::pop_heap -- places the first element in the region into the last position, and reheapifies the region, but leaving the last element alone