I'm creating a web app in PHP where people can try to translate words they need to learn for school.
For example, someone needs to translate the Dutch word 'weer' to 'weather' in English, but unfortunately he types 'whether'. Because he almost typed the right word, I want to give him another try, with dots '.
' on the places where he made a mistake:
Language A: weer
Language B: weather
Input: whether
Output: w..ther
Or, for example
Language A: fout
Language B: mistake
Input: mitake
Output: mi.take
Or:
Language A: echt
Language B: genuine
Input: genuinely
Output: genuinely (almost good, shorten the word a little bit)
But, if the input differs too much from the desired translation, I don't want to get output like ........
I heard about Levenshtein distance and I think I need an algorithm that is much like that one, but I don't know how to place dots on the right place instead of echoing how many operations are to be done.
So, how can I return the misspelled word with dots on the places where someone made a mistake?
First, take a look at the Levenshtein algorithm on wikipedia.
Then go on and look at the examples and the resulting matrix d
on the article page:
*k* *i* *t* *t* *e* *n*
>0 1 2 3 4 5 6
*s* 1 >1 2 3 4 5 6
*i* 2 2 >1 2 3 4 5
*t* 3 3 2 >1 2 3 4
*t* 4 4 3 2 >1 2 3
*i* 5 5 4 3 2 >2 3
*n* 6 6 5 4 3 3 >2
*g* 7 7 6 5 4 4 >3
The distance is found on lower right corner of the matrix, d[m, n]. But from there it is now possible to follow backtrack the minimum steps to the upper left of of the matrix, d[1, 1]. You just go left, up-left, or up at each step whichever minimizes the path. In the example above, you'd find the path marked by the '>' signs:
s i t t i n g k i t t e n
0 1 1 1 1 2 2 3 0 1 1 1 1 2 2 3
^ ^ ^ ^ ^ ^
changes in the distance, replace by dots
Now you can find on the minimum path at which locations d[i,j] the distance changes (marked by ^ in the example above), and for those letters you put in the first (or second) word a dot at position i (or j respectively).
Result:
s i t t i n g k i t t e n
^ ^ ^ ^ ^ ^
. i t t . n . . i t t . n .