Search code examples
c++algorithmpattern-matchinglevenshtein-distance

Levenshtein algorithm: How do I meet this text editing requirements?


I'm using levenshtein algorithm to meet these requirements:

When finding a word of N characters, the words to suggest as correction in my dictionary database are:

Every dictionary word of N characters that has 1 character of difference with the found word. Example: found word:bearn, dictionary word: bears

Every dictionary word of N+1 characters that has N characters equal to the found word. Example: found word:bear, dictionary word: bears

Every dictionary word of N-1 characters that has N-1 characters equal to the found word. Example: found word: bears, dictionary word: bear

I'm using this implementation of Levenshtein algorithm in C++ to find when a word has a Levenshtein number of 1 (which is the Levenshtein number for all three cases), but then how do I choose the word to suggest? I read about Boyer-Moore-Horspool and Knuth-Morris-Pratt but I'm not sure on how either of them can be helpful.

#include <string>
#include <vector>
#include <algorithm>

using namespace std;

int levenshtein(const string &s1, const string &s2)
{
   string::size_type N1 = s1.length();
   string::size_type N2 = s2.length();
   string::size_type i, j;
   vector<int> T(N2+1);

   for ( i = 0; i <= N2; i++ )
      T[i] = i;

   for ( i = 0; i < N1; i++ ) {
      T[0] = i+1;
      int corner = i;
      for ( j = 0; j < N2; j++ ) {
         int upper = T[j+1];
         if ( s1[i] == s2[j] )
            T[j+1] = corner;
         else
            T[j+1] = min(T[j], min(upper, corner)) + 1;
         corner = upper;
      }
   }
   return T[N2];
}

Solution

  • You may also want to add Norvig's excellent article on spelling correction to your reading.

    It's been a while since I've read it but I remember it being very similar to what your writing about.