The levenshtein distance for characters in strings can be computed with the lru_cache
:
from functools import lru_cache
from itertools import zip_longest
@lru_cache(maxsize=4095)
def ld(s, t):
"""
Levenshtein distance memoized implementation from Rosetta code:
https://rosettacode.org/wiki/Levenshtein_distance#Python
"""
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
l1 = ld(s, t[1:]) # Deletion.
l2 = ld(s[1:], t) # Insertion.
l3 = ld(s[1:], t[1:]) # Substitution.
return 1 + min(l1, l2, l3)
E.g.
>>> ld("hello how are the you ?", "how Halo how you are the ?")
13
However if I want to track the distance between 2 lists of words instead of characters in strings, without the lru_cache
it works:
from functools import lru_cache
from itertools import zip_longest
#@lru_cache(maxsize=4095)
def ld(s, t):
"""
Levenshtein distance memoized implementation from Rosetta code:
https://rosettacode.org/wiki/Levenshtein_distance#Python
"""
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
l1 = ld(s, t[1:]) # Deletion.
l2 = ld(s[1:], t) # Insertion.
l3 = ld(s[1:], t[1:]) # Substitution.
return 1 + min(l1, l2, l3)
Usage:
>>> ld("hello how are the you ?".split(), "how Halo how you are the ?".split())
However, without the LRU cache, it's not going to work with longer list of words.
And with the LRU cache, it throws a unhashable type error:
>>> ld("hello how are the you ?".split(), "how Halo how you are the ?".split())
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-1de95d00ebae> in <module>
----> 1 ld("hello how are the you ?".split(), "how Halo how you are the ?".split())
TypeError: unhashable type: 'list'
My questions are:
lru_cache
can be used? Since you are not modifying the lists, but only slicing, you may pass tuples, which are hashable.
s = "hello how are the you ?".split()
t = "how Halo how you are the ?".split()
ld(tuple(s), tuple(t))
Otherwise, you may avoid using lru_cached functions by using loops with extra space, where you memoize calculations