Search code examples
algorithmlevenshtein-distance

Improved Levenshtein Algorithm


I recently implemented the levenshtein algorithm into our search engine database, but we have come across a problem.

According to the basic levenshtein

Levenshtein('123456','12x456') is the same value as Levenshtein('123456', '12345x')

Normally this is fine, but for my specific problem that is incorrect. When someone uses our website, this is incorrect. Manufacturers of electronic components often make similar products with only a difference in the very last letter. If the first letter is different, it's usually an entirely different category. Therefore I need an algorithm that considers matches near the beginning of the word more valuable than those in the back or in other words, mismatches that occur near the beginning should apply a larger penalty than those in the back.

If anyone has any idea please let me know.


Solution

  • We had a similar issue and used the Brew edit distance method

    We were in Perl so we used the Text::Brew library. My co-worker did a nice presentation on using a few different algorithms, including Brew.