Search code examples
algorithmstringcomparisonfilteringranking

What string similarity algorithms are there?


I need to compare 2 strings and calculate their similarity, to filter down a list of the most similar strings.

e.g. searching for "dog" would return

  1. dog
  2. doggone
  3. bog
  4. fog
  5. foggy

e.g. searching for "crack" would return

  1. crack
  2. wisecrack
  3. rack
  4. jack
  5. quack

I have come across:

What other string similarity algorithms are there?


Solution

  • It seems you are needing some kind of fuzzy matching. Here is java implementation of some set of similarity metrics http://www.dcs.shef.ac.uk/~sam/stringmetrics.html. Here is more detailed explanation of string metrics http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf it depends on how fuzzy and how fast your implementation must be.