I was wondering how to implement the following problem: Say I have a 'set' of Strings and I wish to know which one is the most related to a given value.
Example:
String value= "ABBCCE";
Set contains: {"JJKKLL", "ABBCC", "AAPPFFEE", "AABBCCDD", "ABBCEE", "AABBCCEE"}
By 'most related' I assume there could be many options (valid one can be the last 2), but at least we can ignore some items (JJKKLLL
).
What should be the approach to solve this kind of a problem (that at minmum, a result like AABBCCEE
would be acceptable)
Any java code would be appreciated :-)
You could try using the Levenshtein Distance between your "target" string (e.g. "ABBCCE") and each element in your set. Pick a maximum threshold above which you will consider items to be unrelated (in your example here, a threshold of one or two perhaps), and reject everything in the set that has a Levenshtein Distance greater than that from the target string.
An example implementation of the Levenshtein Distance computation in Java can be found here.