Search code examples
clojurememoizationlevenshtein-distance

Levenshtein implementation in Clojure with memoization


This is a minimal Levenshtein (edit distance) implementation using Clojure with recursion:

(defn levenshtein [s1, i1, s2, i2]
  (cond
    (= 0 i1) i2
    (= 0 i2) i1
    :else
    (min (+ (levenshtein s1 (- i1 1) s2 i2) 1)
         (+ (levenshtein s1 i1 s2 (- i2 1)) 1)
         (+ (levenshtein s1 (- i1 1) s2 (- i2 1)) (if (= (subs s1 i1 (+ i1 1)) (subs s2 i2 (+ i2 1))) 0 1))
         )
    )
  )

(defn levenshteinSimple [s1, s2]
  (levenshtein s1, (- (count s1) 1), s2, (- (count s2) 1)))

Which can be used like this:

(println (levenshteinSimple "hello", "hilloo"))
(println (levenshteinSimple "hello", "hilloo"))
(println (levenshteinSimple "bananas", "bananas"))
(println (levenshteinSimple "ananas", "bananas"))

And prints this:

2
2
0
1

How can you add memoize to this implementation to improve performance?

Please note: I am a Clojure beginner. These are my first lines in Clojure


Solution

  • The simplest way is to just make use of the memoize function. It takes a function, and returns a memoized function:

    (let [mem-lev (memoize levenshteinSimple]
      (println (mem-lev "hello", "hilloo"))
      (println (mem-lev "hello", "hilloo"))
      (println (mem-lev "bananas", "bananas"))
      (println (mem-lev "ananas", "bananas")))
    

    mem-lev will memorize every argument you give it and the result of what your function returns, and will return the cached result if it's already seen the arguments that you gave it.

    Note that this will not cause the recursive calls to become memoized, but it's unlikely that any recursive calls would benefit from memoization anyways.

    This will also not cause your original function to become memoized. In this example, only mem-lev will be memoized. If you really wanted to have your global function memoized, you could change your definition to something like:

    (def levenshteinSimple
      (memoize
        (fn [s1, s2]
          ...
    

    But I wouldn't recommend doing this. This causes the function itself to hold state, which isn't ideal. It'll also hold onto that state for the length of the program, which could cause memory issues if abused.

    (As a great exercise, try writing your own version of memoize. I learned a lot by doing that).