Search code examples
algorithmnlplevenshtein-distancejaro-winkler

What string distance algorithm is best for measuring typing accuracy?


I'm trying to write a function that detects how accurate the user typed a particular phrase/sentence/word/words. My objective is to build an app to train the user's typing accuracy of certain phrases.

My initial instinct is to use the basic levenshtein distance algorithm (mostly because that's the only algo I knew off the top of my head).

But after a bit more research, I saw that Jaro-Winkler is a slightly more interesting algorithm because of its consideration for transpositions.

I even found a link that talks about the differences between these algorithms:

Difference between Jaro-Winkler and Levenshtein distance?

Having read all that, in addition to the respective Wikipedia posts, I am still a little clueless as to which algorithm fits my objective the best.


Solution

  • Since you are grading the quality of typing, and you want to train the student to make zero mistakes, you should use Levenshtein distance, because it is less forgiving.

    Additionally, Levenshtein score is more intuitive to understand, and easier to represent graphically, than the Jaro-Winkler results. You can modify Levenshtein algorithm to report insertions, deletions, and mistypes separately, and show end-users a list of corrections. Jaro-Winkler, on the other hand, gives you a score that is hard to show to end-user, because penalties for misspelling in the middle are lower than penalties at the end.