Ok, this is driving me crazy. I created a straightforward Quicksort implementation (from CLR) with random choice of pivot:
def qsr(l, s, e):
def qsrpartition(l, s, e):
pivotindex=random.randrange(s,e+1)
l[e], l[pivotindex] = l[pivotindex], l[e]
p = l[e]
i = s - 1
for j in range(s, e):
if l[j] <= p:
i = i+1
l[i], l[j] = l[j], l[i]
l[i+1], l[e] = l[e], l[i+1]
return i+1
if s < e:
q = qsrpartition(l, s, e)
qsr(l, s, q-1)
qsr(l, q+1, e)
I also have a version with static choice of pivot (comment out the first two lines of qsrpartition). If I run the two versions on a random array, the randomized version above is 100 faster than the one always picking the last element as pivot (1s vs 10s).
li = random.choices(range(5000), k=5000)
li2 = random.choices(range(5000), k=5000)
r1 = timeit.timeit(lambda:qs(li, 0, len(li)-1),number=10)
r2 = timeit.timeit(lambda:qsr(li2, 0, len(li2)-1),number=10)
print(r1, r2)
11.10 0.08
The result above holds probabilistically across runs, array lengths, use of sample with or without replacement, order in which I run the functions etc. For a random choice of array, the choice of pivot should not matter, as the last element should be as good as any intermediate. I feel that there are an obvious explanation, which I am missing here.
Your benchmarking code is not measuring what you intended.
As already commented, the lists li
and li2
are sorted in the first of the 10 runs, and the 9 others are sorting already sorted lists. When a partition is already sorted, then choosing the last entry as pivot represents a worst case, giving the random pivot variant the advantage in 9 out of the 10 runs.
I would also make more runs to get a better estimate.
Here is a correction:
k = 5000
runs = 50
print(" qs", timeit.timeit(lambda: qs(random.choices(range(k), k=k), 0, k-1),number=runs))
print("qsr", timeit.timeit(lambda:qsr(random.choices(range(k), k=k), 0, k-1),number=runs))
This gives an output that slightly favors the non-random pivot version for the plain reason it doesn't have to execute those two extra statements. On repl.it I got output like this:
qs 2.3873363960025017
qsr 2.9921084540001175