Search code examples
algorithmsortinghaskellquicksortmergesort

Why is QuackSort 2x faster than Data.List's sort for random lists?


I was looking for the canonical implementation of MergeSort on Haskell to port to HOVM, and I found this StackOverflow answer. When porting the algorithm, I realized something looked silly: the algorithm has a "halve" function that does nothing but split a list in two, using half of the length, before recursing and merging. So I thought: why not make a better use of this pass, and use a pivot, to make each half respectively smaller and bigger than that pivot? That would increase the odds that recursive merge calls are applied to already-sorted lists, which might speed up the algorithm!

I've done this change, resulting in the following code:

import Data.List
import Data.Word

randomList :: Word32 -> Word32 -> [Word32]
randomList seed 0    = []
randomList seed size = seed : randomList (seed * 1664525 + 1013904223) (size - 1)

quacksort :: [Word32] -> [Word32]
quacksort []           = []
quacksort [x]          = [x]
quacksort (p : x : xs) = split p (p : x : xs) [] [] where

  -- Splits the list in two halves of elements smaller/bigger than a pivot
  split p []       as bs = merge (quacksort as) (quacksort bs)
  split p (x : xs) as bs = quack p (p < x) x xs as bs

  -- Helper function for `split`
  quack p False x xs as bs = split p xs (x : as) bs
  quack p True  x xs as bs = split p xs as (x : bs)

  -- Merges two lists as a sorted one
  merge []       ys       = ys
  merge xs       []       = xs
  merge (x : xs) (y : ys) = place (x < y) x xs y ys

  -- Helper function for `merge`
  place False x xs y ys = y : merge (x : xs) ys
  place True  x xs y ys = x : merge xs (y : ys)

main :: IO ()
main = do
  let l = randomList 0 2000000
  let b = quacksort l
  print $ sum b

I then benchmarked it and, to my surprise, it was, indeed, 2x faster than Haskell's official Data.List sort. So I wondered why this isn't used in practice, and, suddenly, I realized the obvious: mergesort does NOT perform better on already sorted lists. D'oh. So the whole assumption behind quacksort was failed. Not only that, it would perform terribly for reversely sorted lists, since it would fail to produce two halves of similar size (except if we could guess a really good pivot). So, quacksort is wack in all cases and should never be used in practice. But, then...

Why the hell does it perform 2x faster than Data.List's sort for random lists?

I can't think of a good reason this should be the case. Making each half smaller/bigger than a pivot doesn't change how many times the merge call must be called, so it shouldn't have any positive effect. But reverting it back to a conventional mergesort does make it 2x slower, so, for some reason, the ordered split helps.


Solution

  • Your split splits the list in two ordered halves, so merge consumes its first argument first and then just produces the second half in full. In other words it is equivalent to ++, doing redundant comparisons on the first half which always turn out to be True.

    In the true mergesort the merge actually does twice the work on random data because the two parts are not ordered.

    The split though spends some work on the partitioning whereas an online bottom-up mergesort would spend no work there at all. But the built-in sort tries to detect ordered runs in the input, and apparently that extra work is not negligible.