I've been trying to understand where the "5" comes from in the Median of Medians algorithm, but can't seem to find a simple description of how it is derived and why it is optimal.
For example, why isn't say 7 a viable option?
The only advantage I can see for 5 is that it has 2 items on each side of the middle making a sort over the 5 items a simple case of no more than 3 swaps.
The 5 is chosen because it's the smallest value for which the recurrence solves to O(n). 7 works as well, but tends to be slower.
More concretely: if you break the input into blocks of size 5, you get this recurrence:
T(n) ≤ T(n/5) + T(7n/10) + O(n)
This solves to O(n), since the work decays geometrically per level.
If we use blocks of size 3, we get
T(n) ≥ T(n/3) + T(2n/3) + O(n)
Which solves to Ω(n log n).
Choosing blocks of size 7 gives
T(n) ≤ T(n / 7) + T(5n / 7) + O(n)
This also solves to O(n) because the work decays geometrically. However, the constant in the big-O term is larger than in the 5 case because sorting and taking the median of n/7 blocks of size 7 is more work than sorting and taking the median of n/5 blocks of size 5. Accordingly, the blocks-of-five case is used more commonly.
Hope this helps!