Search code examples
javastringalgorithmsetedit-distance

Is there an algorithm for fuzzy search like Levenshtein Distance specialised for a set of ordered character?


I found an algorithm (on https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance) and after reading a bit more about levenshtein, I understood there should be a better way of telling the edit distance of twwo strings if these strings are strictly composed of ascii-aphabetically ordered and unique chars.

Meaning, for every a and b like a < b, a will be prior to b, and the reciprocal (or contraposed or I don't remember) for every a, b, and c like a < b < c, if one strings reads ac and the other ab, one knows for sure the first one does not contain the b.

And that precisely means there is a better way of determining the edit distance between two strings of this kind.

If it is any useful, the class I'm using to organize my characters is a TreeSet of Character.


Solution

  • This is the solution I came up with : I used it with values : String tested, String target, 1, 0.

    /** returns the cost of the difference between a tested CharSequence and a target CharSequence. CS = CharSequence
     * @param tested input, the CS which will be compared to the target. all letters sorted by ASCII order and unique
     * @param target is the CS to which the tested will be compared. all letters sorted by ASCII order and unique
     * @param positiveDifferenceCost is the cost to add when a letter is in the tested CS but not in the target.
     * @param negativeDifferenceCost is the cost to add when a letter is in the target CS but not in the tested.
     * @return int the number of differences.
     */
    
    
    public static int oneSidedOAUDistance(final CharSequence tested, final CharSequence target,
                                          final int positiveDifferenceCost, final int negativeDifferenceCost) {
        int diffCount = 0;
        int index_tested = 0;
        int index_target = 0;
    
        if (positiveDifferenceCost == 0 && negativeDifferenceCost == 0)
            return 0;
    
    
        for (; index_tested < tested.length() && index_target < target.length(); ) {
            if (tested.charAt(index_tested) == target.charAt(index_target)) {
                ++index_tested;
                ++index_target;
                continue;
            }
            if (tested.charAt(index_tested) < target.charAt(index_target)) {
                //some letters should not be there in tested string.
                diffCount+= positiveDifferenceCost;
                index_tested++;
    
            } else {
                //some letters miss in tested string.
                diffCount+=negativeDifferenceCost;
                index_target++;
            }
        }
    
    
        return diffCount;
    }