Let's say I have several very long Strings consisting of completely random characters. I aim to represent their similarity to one designated master String in a number.
For example: 12345 is very similar 23456, but not so similar to 12abcdef
Assuming Java, is there already an efficient implementation for such an algorithm? For example I think this would probably do what I want: https://en.wikipedia.org/wiki/Levenshtein_distance but I need something very efficient for super-long Strings.
I am not sure if there is a java implementation for it, but you can find the implementation for your algorithm here:
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Java
good luck :)