Practicality and impact of worst case avoidance strategies for Quicksort

The average and worst cases of a simple Quicksort are well known: O(n log n) and O(n^2).

To avoid the worst cases (nearly sorted data) I have come across a "strategy family" that I call "median of x" or "Mox" where x is usually 3 though I have also seen 5.

The aim is to determine a pivot that is better suited to handle the values in the current partition should their ordering be skewed.

What happens in "Mo3" is that three values (the first, middle and last) in the current partition are chosen and compared to determine the median of the three which is then assumed to be a better pivot than by always choosing the middle value in the partition.

Performing a single Mo3 means going through the motions of an insertion sort with n=3 which has a best case of 2 comparisons, an average of 8/3 and a worst of 3. For an Mo5 the corresponding values are 4, 167/15 and 17.

As I see it these Mox's will negatively impact the performance of Quicksort by contributing their comparisons to the total every time they are applied to a partion. Of course there is a cutoff point when the Mox is no longer required, probably before reaching a partition size of 3 when using Mo3 and before a partition size of 5 when using Mo5.

So far, I haven't come across a discussion which quantifies their impact. It is almost as if the discussions don't see that the comparisons necessary for using Mox's contribute to the number of comparisons their host quicksort function will generate.

A common raison d'être for Mox's is along the lines of "make the worst cases less bad" but what is seldom said is that they also "makes the average case less good".

So my questions are (comparing a basic Quicksort with one using Mox) "how much less bad?" and "how much less good?"

OOPS I used the average and worst cases for quicksort when it should have been for insertion sort. The correct values are 7.716667 and 10.

The expression "choosing a random pivot" confuses me.

If the data to be sorted is a vector of (perhaps random) numbers I guess choosing a random number is straight-forward if you know the range of values in the vector such that it is possible to choose a value inside the range. Sorting a vector of numeric values seems to me to be a non-real-life scenario more applicable to researching the basic functioning of an algorithm. Outside of research, how often would it be necessary to sort a vector of numerical values?

What if it is a real-life scenario where complex sorting is performed on records in database table where the sorting performed is a mix of comparing several fields in the record, for example zip code, sex, birth date etc? Would the "random pivot" then correspond to a random physical record within the table?

Solution

On the whole, the base strategy for picking a pivot should be to pick a random element. The cost of this strategy is, of course, depending on the cost of generating random numbers (one per partition, so on average O(N) times), but unlike any strategy involving picking pivots from fixed positions, the stochastic strategy is not open to DoS attacks (provided that the random number generator is not open to prediction attacks).

Always choosing the first (or last) element of the partition is not only open to DoS attacks; it also results in worst-case behaviour for already sorted data. That's certainly not acceptable in any normal environment where sorted or partially-sorted inputs can be expected, so it should be rejected off the top.

Median-of-3 (or Median-of-k) strategies where the objects whose median is computed are at fixed positions are subject to DoS attacks. In environments where this is possible, the strategy can be improved by randomly selecting one or more values to participate in the median computation. I believe (although I don't propose to provide any proof of the conjecture) that it is sufficient to select the "middle" value at random, although there might be some tiny benefit to selecting all three values at random. The additional cost is potentially significant, though: two (more) random numbers, and two additional swaps. (Substitute k−1 for 2 if you are using median-of-k.)

The following comparison is based on the above assumptions -- that we will reject the strategy of taking the pivot to be the first element, and that we can do median-of-3 with the median of the first, last and a single randomly-selected element. Since both of these choose one random number per partition, I'll ignore the cost of generating random numbers. I will also use n consistently as the size of the partition, and assume that it is, at least, 4 so that there is no issue with finding three elements for the median. (Generally, real-world QS implementations use a different sort algorithm for small partitions; I think it is fair to assume that for partitions of sizes less than some small number, there are hard-coded sorting functions.)

In the stochastic-pivot strategy, we proceed as follows:

Generate a random index r in the half-open range [0, n).
Swap elements 0 and r. Element 0 will be the pivot value.
Partition the remaining n−1 elements, which involves comparing every one of them with the pivot value once, for a total of n−1 comparisons. The number of swaps is harder to estimate (and varies with partition algorithm) but it is certainly O(n).

In the stochastic-median-of-three strategy, we will first arrange for the elements at positions 0, 1 and n−1 to be sorted in order. So the procedure will be:

Generate a random index r in the half-open range [1, n−1)
Compare elements at locations 0 and n−1; swap them if necessary to put them in order. (1 compare; worst-case 1 swap)
Compare element 0 with element r. If element r is smaller, rotate elements 0, 1 and r. Otherwise, compare element r with element n−1 and if element r is bigger, rotate elements n-1, 1 and r. (worse-case: 2 compares and 1 rotate.) The pivot value is now the value at position 1.
Use the same partitioning algorithm, but this time involving n−3 elements, since elements 0 and n−1 are already correctly placed. This involves n−3 comparisons and O(n) swaps.

Now, to compare the two strategies.

The cost of step 1 is identical in the two cases, since the cost of generating a random number does not depend on the range of possibilities (at least, given these two possible ranges).
The cost of step 2 in the first strategy is zero compares and one swap. The cost of steps 2 and 3 combined in the second strategy is two or three compares (expected 8/3), zero or one swaps (expected 1/2) and zero or one rotate (expected 2/3). The difference in worst cases is three compares and one rotate; the difference in expected cost is 8/3 compares and 2/3 of a rotate less 1/2 of a swap.

(Parenthetically, a swap is three mutations -- t = a; a = b; b = t -- while a rotate is four mutations -- t = a; a = b; b = c; c = t. If we approximate a rotate as 4/3 of a swap, the worst case difference is 4/3 of a swap, and the expected case is 7/18 of a swap.)
The difference between step 3 of the first strategy and step 4 of the second strategy is in the opposite direction: stochastic median-of-3 uses 2 fewer compares and O(1) fewer swaps.

So, adding all that up, we can estimate the per-partition cost of stochastic median-of-3 to be at worst one compare, and on average 2/3 of a compare. (I'm not entirely sure how to compare the swap count, but I think stochastic-median-of-3 comes out ahead in the worst case and approximately equal in the average case.)

Now, the question is: does stochastic median-of-3 provide enough benefits to outweigh 2/3 of a compare per partition? It certainly wouldn't take much benefit to justify that cost, but I'm a little sceptical that it provides even that little.

If we were to use median-of-5 instead of median-of-3, then the cost/benefit is more skewed. Median-of-5 saves an additional two compares in the partitioning phase (since the remaining subarray is now n−5) but the cost of computing the median is somewhat higher. It is not as high as suggested by the OP, however: it is possible to arrange five elements in sequence with worst-case seven comparisons (the expected number is only very slightly smaller than seven) using a well-known algorithm described in this SO question. But even with the cost being seven (instead of the average 11.17/worst-case 17 suggested in the OP), it is still quite a bit larger than the saving of four comparisons in the partitioning phase.

So I conclude that stochastic median-of-3 might be slightly better, but I suspect not enough to justify the additional code complexity, but stochastic median-of-5 definitely is not.