Search code examples
javastringlevenshtein-distance

Is there an efficient implementation for quantifying the similarity between two Strings?


Let's say I have several very long Strings consisting of completely random characters. I aim to represent their similarity to one designated master String in a number.

For example: 12345 is very similar 23456, but not so similar to 12abcdef

Assuming Java, is there already an efficient implementation for such an algorithm? For example I think this would probably do what I want: https://en.wikipedia.org/wiki/Levenshtein_distance but I need something very efficient for super-long Strings.


Solution

  • I am not sure if there is a java implementation for it, but you can find the implementation for your algorithm here:

    http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Java

    good luck :)