Search code examples
algorithmsortingdebuggingquicksort

Quicksort performance in Python - random pivot vs static


Ok, this is driving me crazy. I created a straightforward Quicksort implementation (from CLR) with random choice of pivot:

def qsr(l, s, e):
    def qsrpartition(l, s, e):
        pivotindex=random.randrange(s,e+1)
        l[e], l[pivotindex] = l[pivotindex], l[e]
        p = l[e]
        i = s - 1
        for j in range(s, e):
            if l[j] <= p:
                i = i+1
                l[i], l[j] = l[j], l[i]
        l[i+1], l[e] = l[e], l[i+1]
        return i+1
        
    if s < e:
        q = qsrpartition(l, s, e)
        qsr(l, s, q-1)
        qsr(l, q+1, e)    

I also have a version with static choice of pivot (comment out the first two lines of qsrpartition). If I run the two versions on a random array, the randomized version above is 100 faster than the one always picking the last element as pivot (1s vs 10s).

li = random.choices(range(5000), k=5000)
li2 = random.choices(range(5000), k=5000)
r1 = timeit.timeit(lambda:qs(li, 0, len(li)-1),number=10)
r2 = timeit.timeit(lambda:qsr(li2, 0, len(li2)-1),number=10)
print(r1, r2)

11.10 0.08

The result above holds probabilistically across runs, array lengths, use of sample with or without replacement, order in which I run the functions etc. For a random choice of array, the choice of pivot should not matter, as the last element should be as good as any intermediate. I feel that there are an obvious explanation, which I am missing here.


Solution

  • Your benchmarking code is not measuring what you intended.

    As already commented, the lists li and li2 are sorted in the first of the 10 runs, and the 9 others are sorting already sorted lists. When a partition is already sorted, then choosing the last entry as pivot represents a worst case, giving the random pivot variant the advantage in 9 out of the 10 runs.

    I would also make more runs to get a better estimate.

    Here is a correction:

    k = 5000
    runs = 50
    print(" qs", timeit.timeit(lambda: qs(random.choices(range(k), k=k), 0, k-1),number=runs))
    print("qsr", timeit.timeit(lambda:qsr(random.choices(range(k), k=k), 0, k-1),number=runs))
    

    This gives an output that slightly favors the non-random pivot version for the plain reason it doesn't have to execute those two extra statements. On repl.it I got output like this:

     qs 2.3873363960025017
    qsr 2.9921084540001175