I was reading the article on finding the k-th highest element in an array through the median-of-medians
algorithm at ardendertat. In the portion explaining the complexity, the author seems to have discounted one factor, the cost of finding the median-of-medians
for each partition recursively. Surely I can't partition all sub-arrays by the initial pivot, right? So won't this add to the complexity?
The position p
of the median of medians after partitioning the array is between 0.3*n
and 0.7*n
.
So after one round, we have three possibilities:
p == n-k
, by pure luck, the first pivot is the k
-th largest element, found in O(n) (median of medians and partitioning are both O(n)).p > n-k
, then the k
-th largest element is smaller than the first pivot, and we need to find the k - (n-p)
-th largest element of the first part, which has at most 0.7*n
elements, so the total cost for finding the k
-th largest element is
T(n) <= T(0.7*n) + C*n
p < n-k
, then we need to find the k
-th largest element of the second part (after the pivot), that part is also at most 0.7*n
elements large, so we again have the estimate
T(n) <= T(0.7*n) + C*n
Iterating, we find
T(n) <= T((0.7)^k * n) + C*(1 + 0.7 + ... + (0.7)^(k-1))*n
<= T(1) + C/(1 - 0.7)*n
which evidently is O(n).