Search code examples
algorithmsortinghaskelllazy-evaluation

the way merge-sort faster than insertion-sort puzzles me


Just got my feet wet in sorting algorithm with Haskell. I've implemented insertion-sort and merge-sort

insert_sort :: (Ord a, Show a) => [a] -> [a]
insert_sort keys = foldr f [] keys
           where f key []        = [key]
                 f key acc       = insert key acc
                 insert y []     = [y]
                 insert y (x:xs)
                     | x < y     = x : insert y xs
                     | otherwise = y : x : xs

merge_sort :: (Ord a, Show a) => [a] -> [a]
merge_sort (x:[]) = [x]
merge_sort keys   = merge  (merge_sort (take len keys)) (merge_sort (drop len keys))
      where len         = length keys `div` 2
            merge :: [a] -> [a] -> [a]
            merge (x:xs) []     = (x:xs)
            merge []     (y:ys) = (y:ys)
            merge (x:xs) (y:ys) = if x <= y
                                  then x : merge (xs) (y:ys)
                                  else y : merge (x:xs) ys

Here's how I compared their efficiency:

insert_sort $ take 100000 $ randomRs (1,100000) $ mkStdGen 1 ::[Int]
merge_sort $ take 100000 $ randomRs (1,100000) $ mkStdGen 1 ::[Int]

Both of them starts to print out results after a short delay but merge-sort prints much faster. As we know, merge-sort is much faster than insertion-sort for large data sets. I thought that would be shown by how they give results (like a long delay versus a short one) not how they print results. Is it because I use foldr in insertion-sort? What's behind the scene?

EDIT: Thx guys. I've heard about lazy evaluation since I started to learn Haskell but yet got the hang of it. Would anybody illustrate a bit more with a small data set, say [5,2,6,3,1,4]? How is it possible to output results before finish sorting with foldr since the first elements comes at last?


Solution

  • Behind the scene is lazy evaluation. The start of the sorted lists is determined before the sort is complete, so it can be output before the work is finished. Since a mergesort is faster, the merge-sorted list is printed out faster.

    As requested: how sorting [5,2,6,3,1,4] proceeds. I use insert_sort = foldr ins [] for brevity.

    insert_sort [5,2,6,3,1,4]
      = foldr ins [] [5,2,6,3,1,4]
      = 5 `ins` foldr ins [] [2,6,3,1,4]
      = 5 `ins` 2 `ins` [6,3,1,4] ...
      = 5 `ins` 2 `ins` 6 `ins` 3 `ins` 1 `ins` 4 `ins` []
      = 5 `ins` 2 `ins` 6 `ins` 3 `ins` 1 `ins` (4:[])
      = 5 `ins` 2 `ins` 6 `ins` 3 `ins` (1:4:[])
      = 5 `ins` 2 `ins` 6 `ins` (1 : (3 `ins` (4:[])))
      = 5 `ins` 2 `ins` (1 : (6 `ins` (3 `ins` (4:[]))))
      = 5 `ins` (1 : (2 `ins` (6 `ins` (3 `ins` (4:[])))))
      = 1 : (5 `ins` (2 `ins` (6 `ins` (3 `ins` (4:[])))))  -- now 1 can be output
      = 1 : (5 `ins` (2 `ins` (6 `ins` (3:4:[]))))
      = 1 : (5 `ins` (2 `ins` (3 : (6 `ins` (4:[])))))
      = 1 : (5 `ins` (2 : (3 : (6 `ins` (4:[])))))
      = 1 : 2 : (5 `ins` (3 : (6 `ins` (4:[]))))            -- now 2 can be output
      = 1 : 2 : 3 : (5 `ins` (6 `ins` (4:[])))              -- now 3
      = 1 : 2 : 3 : (5 `ins` (4:6:[]))
      = 1 : 2 : 3 : 4 : (5 `ins` (6:[]))                    -- now 4
      = 1 : 2 : 3 : 4 : 5 : 6 : []                          -- done
    

    And merge sort (abbreviations: merge = mg, merge_sort = ms):

    merge_sort [5,2,6,3,1,4]
      = mg (ms [5,2,6]) (ms [3,1,4])
      = mg (mg (ms [5]) (ms [2,6])) (mg (ms [3]) (ms [1,4]))
      = mg (mg [5] (mg [2] [6])) (mg [3] (mg [1] [4]))
      = mg (mg [5] [2,6]) (mg [3] [1,4])
      = mg (2 : mg [5] [6]) (1 : mg [3] [4])
      = 1 : mg (2 : mg [5] [6]) (mg [3] [4])                -- now 1 can be output
      = 1 : mg (2 : mg [5] [6]) [3,4]
      = 1 : 2 : mg (mg [5] [6]) [3,4]                       -- now 2 can be output
      = 1 : 2 : mg [5,6] [3,4]
      = 1 : 2 : 3 : mg [5,6] [4]                            -- now 3
      = 1 : 2 : 3 : 4 : mg [5,6] []                         -- now 4
      = 1 : 2 : 3 : 4 : 5 : 6 : []                          -- now 5 and 6
    

    Admittedly I've taken a few short cuts, but Haskell isn't the only lazy one.