Search code examples
javalambdafilteringlevenshtein-distance

Java Lambda create a filter with a predicate function which determines if the Levenshtein distance is greater than 2


I have a query to get the most similar value. Well I need to define the minimum Levenshtein distance result. If the score is more than 2, I don't want to see the value as part of the recommendation.

String recommendation =  candidates.parallelStream()
            .map(String::trim) 
            .filter(s -> !s.equals(search))
            .min((a, b) -> Integer.compare(
              cache.computeIfAbsent(a, k -> StringUtils.getLevenshteinDistance(Arrays.stream(search.split(" ")).sorted().toString(), Arrays.stream(k.split(" ")).sorted().toString()) ),
              cache.computeIfAbsent(b, k -> StringUtils.getLevenshteinDistance(Arrays.stream(search.split(" ")).sorted().toString(), Arrays.stream(k.split(" ")).sorted().toString()))))
            .get();

Solution

  • You question is about one single filtering operation: how to exclude the elements with the score more 2. You need to write a predicate for it. The simplest form of a predicate that can be written without knowing any details about the rest of your application logic is the following:

    .filter(s -> StringUtils.getLevenshteinDistance(search, s) <= 2)
    

    Considering that you cache the Levenshtein scores in a HashMap, the predicate should be rewritten this way:

    .filter(s -> cache.computeIfAbsent(s, k -> StringUtils.getLevenshteinDistance(search, k)) <= 2)
    

    Now, if you want to do anything else with the elements like splitting, reordering and joining them, you can further enhance this code, but that's outside of the scope of your question.

    Nevertheless, speaking of the splitting/joining, let me correct an error in your code. The line

    Arrays.stream(search.split(" ")).sorted().toString()
    

    does not really do anything useful. It would just print a hashcode of a Stream instance. I guess you wanted to get this done:

    Arrays.stream(s.split(" ")).sorted().collect(Collectors.joining(" "))
    

    This code will reorder a word chain alphabetically: "Malus Casus" -> "Casus Malus"