haskell memoization levenshtein-distance

Using Data.Memocombinators to implement edit distance algorithm

Lets say I wanted to implement the usual dynamic programming algorithm for Levensthein distance (edit distance). It is quite easy to come up with the recursion:

editDistance [] ys = length ys
editDistance xs [] = length xs
editDistance (x:xs) (y:ys) 
  | x == y = editDistance xs ys
  | otherwise = minimum [
      1 + editDistance xs (y:ys),
      1 + editDistance (x:xs) ys,
      1 + editDistance xs ys]

This suffers from exponential running time, so it is necessary to memoize the function. I want to do so by using Data.Memocombinators, and I have tried several ways. Here is my current attempt:

import qualified Data.MemoCombinators as M

memLength str = 
  M.wrap 
    (\i -> drop i str) 
    (\xs -> length str - length xs)
    (M.arrayRange (0,length str))

elm xs ys = (M.memo2 memListx memListy editDistance) xs ys
  where
    memListx = memLength xs
    memListy = memLength ys

However, the memoization seems not to have any effect, although I would expect any memoization to have a noticeable improvement to running time since it would at least be polynomial. What is wrong with my implementation? How can I get an ok running time while preserving as much as possible of the high level definition of edit distance?

Solution

If the code you posted is actually what you're doing, then you're doing it wrong! :-). If you are going to memoize a recursive function, you need to have the calls to the recursive version call back into the memoized version. So eg.:

editDistance (x:xs) (y:ys)
  | x == y = elm xs ys
  | ...

Otherwise you do the full recursive computation and only store the final result. You need to store intermediate results.

There is another issue here. The memo table for elm should not depend on its arguments (ideally you shouldn't even mention the arguments, so you don't depend on the compiler being smart enough to figure out the dependencies). If the memo table depends on the arguments, then you have to build a new table for each different argument, and we need to share a table for all arguments. You could try something dumb like memoizing on the whole structure of the arguments:

elm = M.memo2 (M.list M.char) (M.list M.char)

See if that speeds it up (incorporating the former trick). Then you can move on to trying to use just the lengths instead of the entire list structure for an extra boost.

Hope that helped.