Just got my feet wet in sorting algorithm with Haskell. I've implemented insertion-sort and merge-sort
insert_sort :: (Ord a, Show a) => [a] -> [a]
insert_sort keys = foldr f [] keys
where f key [] = [key]
f key acc = insert key acc
insert y [] = [y]
insert y (x:xs)
| x < y = x : insert y xs
| otherwise = y : x : xs
merge_sort :: (Ord a, Show a) => [a] -> [a]
merge_sort (x:[]) = [x]
merge_sort keys = merge (merge_sort (take len keys)) (merge_sort (drop len keys))
where len = length keys `div` 2
merge :: [a] -> [a] -> [a]
merge (x:xs) [] = (x:xs)
merge [] (y:ys) = (y:ys)
merge (x:xs) (y:ys) = if x <= y
then x : merge (xs) (y:ys)
else y : merge (x:xs) ys
Here's how I compared their efficiency:
insert_sort $ take 100000 $ randomRs (1,100000) $ mkStdGen 1 ::[Int]
merge_sort $ take 100000 $ randomRs (1,100000) $ mkStdGen 1 ::[Int]
Both of them starts to print out results after a short delay but merge-sort prints much faster. As we know, merge-sort is much faster than insertion-sort for large data sets. I thought that would be shown by how they give results (like a long delay versus a short one) not how they print results. Is it because I use foldr
in insertion-sort? What's behind the scene?
EDIT: Thx guys. I've heard about lazy evaluation since I started to learn Haskell but yet got the hang of it. Would anybody illustrate a bit more with a small data set, say [5,2,6,3,1,4]? How is it possible to output results before finish sorting with foldr
since the first elements comes at last?
Behind the scene is lazy evaluation. The start of the sorted lists is determined before the sort is complete, so it can be output before the work is finished. Since a mergesort is faster, the merge-sorted list is printed out faster.
As requested: how sorting [5,2,6,3,1,4]
proceeds. I use insert_sort = foldr ins []
for brevity.
insert_sort [5,2,6,3,1,4]
= foldr ins [] [5,2,6,3,1,4]
= 5 `ins` foldr ins [] [2,6,3,1,4]
= 5 `ins` 2 `ins` [6,3,1,4] ...
= 5 `ins` 2 `ins` 6 `ins` 3 `ins` 1 `ins` 4 `ins` []
= 5 `ins` 2 `ins` 6 `ins` 3 `ins` 1 `ins` (4:[])
= 5 `ins` 2 `ins` 6 `ins` 3 `ins` (1:4:[])
= 5 `ins` 2 `ins` 6 `ins` (1 : (3 `ins` (4:[])))
= 5 `ins` 2 `ins` (1 : (6 `ins` (3 `ins` (4:[]))))
= 5 `ins` (1 : (2 `ins` (6 `ins` (3 `ins` (4:[])))))
= 1 : (5 `ins` (2 `ins` (6 `ins` (3 `ins` (4:[]))))) -- now 1 can be output
= 1 : (5 `ins` (2 `ins` (6 `ins` (3:4:[]))))
= 1 : (5 `ins` (2 `ins` (3 : (6 `ins` (4:[])))))
= 1 : (5 `ins` (2 : (3 : (6 `ins` (4:[])))))
= 1 : 2 : (5 `ins` (3 : (6 `ins` (4:[])))) -- now 2 can be output
= 1 : 2 : 3 : (5 `ins` (6 `ins` (4:[]))) -- now 3
= 1 : 2 : 3 : (5 `ins` (4:6:[]))
= 1 : 2 : 3 : 4 : (5 `ins` (6:[])) -- now 4
= 1 : 2 : 3 : 4 : 5 : 6 : [] -- done
And merge sort (abbreviations: merge = mg
, merge_sort = ms
):
merge_sort [5,2,6,3,1,4]
= mg (ms [5,2,6]) (ms [3,1,4])
= mg (mg (ms [5]) (ms [2,6])) (mg (ms [3]) (ms [1,4]))
= mg (mg [5] (mg [2] [6])) (mg [3] (mg [1] [4]))
= mg (mg [5] [2,6]) (mg [3] [1,4])
= mg (2 : mg [5] [6]) (1 : mg [3] [4])
= 1 : mg (2 : mg [5] [6]) (mg [3] [4]) -- now 1 can be output
= 1 : mg (2 : mg [5] [6]) [3,4]
= 1 : 2 : mg (mg [5] [6]) [3,4] -- now 2 can be output
= 1 : 2 : mg [5,6] [3,4]
= 1 : 2 : 3 : mg [5,6] [4] -- now 3
= 1 : 2 : 3 : 4 : mg [5,6] [] -- now 4
= 1 : 2 : 3 : 4 : 5 : 6 : [] -- now 5 and 6
Admittedly I've taken a few short cuts, but Haskell isn't the only lazy one.