Search code examples
javaapache-commons-collection

Removing duplicates CollectionUtils.collate method


I'm trying to find the union of two collections by using the CollectionUtils.collate method. This method comes from the package org.apache.commons.collections4

Here is the code portion :

Collection<String> tokensUnion2 = CollectionUtils.collate(
    Arrays.asList(new String[]{"my", "sentence", "test", "for", "testing"}), 
    Arrays.asList(new String[]{"my", "sentence", "test", "is", "this"}), 
    false);

The result collection is the one below :

[my, sentence, test, for, test, is, testing, this]

As you can see, the resulting collection contains duplicates, even though the third parameter of CollectionUtils.collate indicates that I don't want duplicates.

Plus, the String duplicate sentence was eliminated, but the test is still there.

I could resolve this issue by simply putting the resulting collection in a HashSet, but I'd like to know what I've done wrong.

Thank you.


Solution

  • The collate method expects two sorted collections. The java doc for CollectionUtils#collate says: Merges two sorted Collections, a and b, into a single, sorted List such that the natural ordering of the elements is retained.

    In your example, the two Lists supplied as arguments are not sorted. If you modify the code to sort the List like

    List<String> list1 = Arrays.asList(new String[] { "my", "sentence", "test", "for", "testing" });
    List<String> list2 = Arrays.asList(new String[] { "my", "sentence", "test", "is", "this" });
    
    Collections.sort(list1);
    Collections.sort(list2);
    
    Collection<String> tokensUnion2 = CollectionUtils.collate(list1, list2, false);
    

    This will return you a sorted Collection which will have no duplicates

    [for, is, my, sentence, test, testing, this]
    

    I hope this helps.