algorithm computation-theory median-of-medians

Where does the constant 5 come from in the median-of-medians algorithm?

I've been trying to understand where the "5" comes from in the Median of Medians algorithm, but can't seem to find a simple description of how it is derived and why it is optimal.

For example, why isn't say 7 a viable option?

The only advantage I can see for 5 is that it has 2 items on each side of the middle making a sort over the 5 items a simple case of no more than 3 swaps.

Solution

The 5 is chosen because it's the smallest value for which the recurrence solves to O(n). 7 works as well, but tends to be slower.

More concretely: if you break the input into blocks of size 5, you get this recurrence:

T(n) ≤ T(n/5) + T(7n/10) + O(n)

This solves to O(n), since the work decays geometrically per level.

If we use blocks of size 3, we get

T(n) ≥ T(n/3) + T(2n/3) + O(n)

Which solves to Ω(n log n).

Choosing blocks of size 7 gives

T(n) ≤ T(n / 7) + T(5n / 7) + O(n)

This also solves to O(n) because the work decays geometrically. However, the constant in the big-O term is larger than in the 5 case because sorting and taking the median of n/7 blocks of size 7 is more work than sorting and taking the median of n/5 blocks of size 5. Accordingly, the blocks-of-five case is used more commonly.

Hope this helps!