haskell recursion tail levenshtein-distance sequencing

Haskell tail-recursion performance question for Levenshtein distances

I'm playing around with calculating Levenshtein distances in Haskell, and am a little frustrated with the following performance problem. If you implement it most 'normal' way for Haskell, like below (dist), everything works just fine:

dist :: (Ord a) => [a] -> [a] -> Int
dist s1 s2 = ldist s1 s2 (L.length s1, L.length s2)

ldist :: (Ord a) => [a] -> [a] -> (Int, Int) -> Int
ldist _ _ (0, 0) = 0
ldist _ _ (i, 0) = i
ldist _ _ (0, j) = j
ldist s1 s2 (i+1, j+1) = output
  where output | (s1!!(i)) == (s2!!(j)) = ldist s1 s2 (i, j)
               | otherwise = 1 + L.minimum [ldist s1 s2 (i, j)
                                          , ldist s1 s2 (i+1, j)
                                          , ldist s1 s2 (i, j+1)]

But, if you bend your brain a little and implement it as dist', it executes MUCH faster (about 10x).

dist' :: (Ord a) => [a] -> [a] -> Int
dist' o1 o2 = (levenDist o1 o2 [[]])!!0!!0 

levenDist :: (Ord a) => [a] -> [a] -> [[Int]] -> [[Int]]
levenDist s1 s2 arr@([[]]) = levenDist s1 s2 [[0]]
levenDist s1 s2 arr@([]:xs) = levenDist s1 s2 ([(L.length arr) -1]:xs)
levenDist s1 s2 arr@(x:xs) = let
    n1 = L.length s1
    n2 = L.length s2
    n_i = L.length arr
    n_j = L.length x
    match | (s2!!(n_j-1) == s1!!(n_i-2)) = True | otherwise = False
    minCost = if match      then (xs!!0)!!(n2 - n_j + 1) 
                            else L.minimum [(1 + (xs!!0)!!(n2 - n_j + 1))
                                          , (1 + (xs!!0)!!(n2 - n_j + 0))
                                          , (1 + (x!!0))
                                          ]
    dist | (n_i > n1) && (n_j > n2)  = arr 
         | n_j > n2  = []:arr `seq` levenDist s1 s2 $ []:arr
         | n_i == 1 = (n_j:x):xs `seq` levenDist s1 s2 $ (n_j:x):xs
         | otherwise = (minCost:x):xs `seq` levenDist s1 s2 $ (minCost:x):xs
    in dist

I've tried all the usual seq tricks in the first version, but nothing seems to speed it up. This is a little unsatisfying for me, because I expected the first version to be faster because it doesn't need to evaluate the entire matrix, only the parts it needs.

Does anyone know if it is possible to get these two implementations to perform similarly, or am I just reaping the benefits of tail-recursion optimizations in the latter, and therefore need to live with its unreadability if I want performance?

Thanks, Orion

Solution

I don't follow all of your second attempt just yet, but as far as I recall the idea behind the Levenshtein algorithm is to save repeated calculation by using a matrix. In the first piece of code, you are not sharing any calculation and thus you will be repeating lots of calculations. For example, when calculating ldist s1 s2 (5,5) you'll make the calculation for ldist s1 s2 (4,4) at least three separate times (once directly, once via ldist s1 s2 (4,5), once via ldist s1 s2 (5,4)).

What you should do is define an algorithm for generating the matrix (as a list of lists, if you like). I think this is what your second piece of code is doing, but it seems to focus on calculating the matrix in a top-down manner rather than building up the matrix cleanly in an inductive style (the recursive calls in the base case are quite unusual to my eye). Unfortunately I don't have time to write out the whole thing, but thankfully someone else has: look at the first version at this address: http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance#Haskell

Two more things: one, I'm not sure the Levenshtein algorithm can ever use only part of the matrix anyway, as each entry is dependent on the diagonal, vertical and horizontal neighbour. When you need the value for one corner, you'll inevitably have to evaluate the matrix all the way to the other corner. Secondly, that match | foo = True | otherwise = False line can be replaced by simply match = foo.