Search code examples
javasetset-theory

Cull all duplicates in a set


I'm using Set to isolate the unique values of a List (in this case, I'm getting a set of points):

Set<PVector> pointSet = new LinkedHashSet<PVector>(listToCull);

This will return a set of unique points, but for every item in listToCull, I'd like to test the following: if there is a duplicate, cull all of the duplicate items. In other words, I want pointSet to represent the set of items in listToCull which are already unique (every item in pointSet had no duplicate in listToCull). Any ideas on how to implement?

EDIT - I think my first question needs more clarification. Below is some code which will execute what I'm asking for, but I'd like to know if there is a faster way. Assuming listToCull is a list of PVectors with duplicates:

Set<PVector> pointSet = new LinkedHashSet<PVector>(listToCull);
    List<PVector> uniqueItemsInListToCull = new ArrayList<PVector>();
    for(PVector pt : pointSet){
        int counter=0;
        for(PVector ptCheck : listToCull){
            if(pt==ptCheck){
                counter++;
            }
        }
        if(counter<2){
            uniqueItemsInListToCull.add(pt);
        }
    }

uniqueItemsInListToCull will be different from pointSet. I'd like to do this without loops if possible.


Solution

  • You will have to do some programming yourself: Create two empty sets; on will contain the unique elements, the other the duplicates. Then loop through the elements of listToCull. For each element, check whether it is in the duplicate set. If it is, ignore it. Otherwise, check if it is in the unique element set. If it is, remove it there and add to the duplicates set. Otherwise, add it to the unique elements set.

    If your PVector class has a good hashCode() method, HashSets are quite efficient, so the performance of this will not be too bad.

    Untested:

    Set<PVector> uniques = new HashSet<>();
    Set<PVector> duplicates = new HashSet<>();
    for (PVector p : listToCull) {
        if (!duplicates.contains(p)) {
            if (uniques.contains(p)) {
                uniques.remove(p);
                duplicates.add(p);
            }
            else {
                uniques.add(p);
            }
        }
    }
    

    Alternatively, you may use a third-party library which offers a Bag or MultiSet. This allows you to count how many occurrences of each element are in the collection, and then at the end discard all elements where the count is different than 1.