Why does Quicksort have deep function call stack with many duplicates in input?

I'm reading Elements of Programming Interviews, and on page 43, there's the following line/quote:

Implemented naively, quicksort has large run times and deep function call stacks on arrays with many duplicates because the subarrays may differ greatly in size.

One solution is to reorder the array so that all emements less than the pivot appear first, followed by elements equal to the pivot, followed by elements greater than the pivot.

This is known as Dutch National Flag partitioning.

I'm having a lot of trouble visualising the problem described in the first line, and so I don't understand how the Dutch National Flag partitioning scheme helps. I was unable to find a webpage that explains this clearly. Can I please get some help explaining this visually?

Solution

The quick sort algorithm requires as one of its steps some algorithm to partition the data, based on a selected pivot. The ideal scenario is that each partition is as small as possible, so that it requires fewer further divisions. If more than half the elements end up in one partition, more steps will be needed to conclude the algorithm.

As an illustrative example of why duplicates cause this asymmetry, consider sorting the following list, which contains six elements out of eight with the same value:

2, 2, 1, 2, 2, 2, 2, 3

If you were asked to partition this into two lists, you might well put all elements less than the pivot into one partition, and all those greater than or equal to it in another.

This is for instance how the common "Lomuto's algorithm" works, with the pivot itself excluded from both partitions. This algorithm is often considered relatively simple to understand and implement, so may be what the author had in mind with the phrase "naive implementation".

In this scheme, the first step might partition the list as follows:

Pivot: 2
Less than pivot: 1
More than or equal to pivot: 2, 2, 2, 2, 2, 3

The second partition is then recursively partitioned:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 2, 2, 2, 3

Step 3:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 2, 2, 3

Step 4:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 2, 3

Step 5:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 3

Step 6:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 3

Here recursion can finally stop. This is much worse than the 3 steps we would expect if the partitions were always of equal size (partitions of 4, 2, and then 1). The choice of pivot doesn't matter, because even if we found the correct position for the 3 sooner, we would still need one step for each 2 in the list.

A "fat pivot" or "Dutch flag" partitioning scheme extends the above by separating out all the values equal to the pivot into a third partition. Rather than just balancing the partitions, this has the result that both partitions are smaller.

In our example, the result immediately looks like this:

Pivot: 2
Equal to pivot: 2, 2, 2, 2, 2
Less than pivot: 1
More than pivot: 3

Since the values in the new partition are all equal, it doesn't need further sorting. In our example, the remaining partitions have one element each, so there is no need to perform any recursion at all. For less extreme cases, the two partitions left to sort will both be smaller, so need fewer further steps to sort.

However, the cost of that one partitioning step will be higher due to the greater complexity of the partitioning algorithm.

Other partitioning schemes have two partitions, but allow values equal to the pivot to be in either partition. This means the partitions can be evenly sized at each step, even with duplicate values (although it doesn't guarantee that they will be). The original algorithm proposed by Tony Hoare when inventing Quick Sort has this property.

In this case, the first step might give a result like this:

Left partition: 2, 2, 1, 2
Right partition: 2, 2, 2, 3

This is less efficient than a "fat pivot" for our extreme example, but much more efficient than having one very large and one very small partition.