Search code examples
javavectorsimilarityknn

Efficient way to Find most similar List<String>


I have a list1<String> and other 1000 list<String>. I need to choose the list with the most exact match values.

Today I go over each list<String> and compare to list1 , save the coverge in some sortedlist and in the end choose the most similar list.

public static <T> List<T> intersection(List<T> list1, List<T> list2) {
        List<T> list = new ArrayList<T>();

        for (T t : list1) {
            if(list2.contains(t)) {
                list.add(t);
            }
        }

        return list;
    }

This operation to go over all the 1000 unique lists is taken lost of time assuming I have lots of lists to compare it too.

Could you please suggest me an efficient way / algorithm to do it?


Solution

  • Your lists are not sorted, so any contains() operation needs to search the whole list (or until found so N/2 on average).
    So first sort (Collections.sort()) all lists, then use Collections.binarySearch() to find whether the String is contained or not. This needs then only (log N) instead of N/2 as before.