Can anyone suggest any methods or link to implementations of fast median finding for dynamic ranges in c++? For example, suppose that for iterations in my program the range grows, and I want to find the median at each run.
Range
4
3,4
8,3,4
2,8,3,4
7,2,8,3,4
So the above code would ultimately produce 5 median values for each line.
The best you can get without also keeping track of a sorted copy of your array is re-using the old median and updating this with a linear-time search of the next-biggest value. This might sound simple, however, there is a problem we have to solve.
Consider the following list (sorted for easier understanding, but you keep them in an arbitrary order):
1, 2, 3, 3, 3, 4, 5
// *
So here, the median is 3
(the middle element since the list is sorted). Now if you add a number which is greater than the median, this potentially "moves" the median to the right by one half index. I see two problems: How can we advance by one half index? (Per definition, the median is the mean value of the next two values.) And how do we know at which 3
the median was, when we only know the median was 3
?
This can be solved by storing not only the current median but also the position of the median within the numbers of same value, here it has an "index offset" of 1
, since it's the second 3
. Adding a number greater than or equal to 3
to the list changes the index offset to 1.5
. Adding a number less than 3 changes it to 0.5
.
When this number becomes less than zero, the median changes. It also have to change if it goes beyond the count of equal numbers (minus 1
), in this case 2
, meaning the new median is more than the last equal number. In both cases, you have to search for the next smaller / next greater number and update the median value. To always know what the upper limit for the index offset is (in this case 2
), you also have to keep track of the count of equal numbers.
This should give you a rough idea of how to implement median updating in linear time.