I am searching for an elegant way to filter a list for only the elements that are unique. An example:
[1, 2, 2, 3, 1, 4]
-> [3, 4] // 1 and 2 occur more than once
Most solutions I found manually compute the occurrences of all elements and then filter by the elements that have exactly one occurrence.
That does not sound too elegant to me, maybe there is a better solution, a best practice or a name for a data-structure that solves this already? I was also thinking about maybe utilizing streams, but I do not know how.
Note that I am not asking for duplicate removal, i.e. [1, 2, 3, 4]
but for keeping only the unique elements, so [3, 4]
.
The order of the resulting list or what type of Collection
exactly does not matter to me.
I doubt there is a better approach than actually counting and filtering for the ones that only appeared once. At least, all approaches I can think of will use something similar to that under the hood.
Also, it is not clear what you mean by elegant, readability or performance? So I will just dump some approaches.
Stream
countingHere is a stream-variant that computes number of occurrences (Map
) and then filters for elements that appear only once. It is essentially the same as what you described already, or what Bag
s do under the hood:
List<E> result = elements.stream() // Stream<E>
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) // Map<E, Long>
.entries() // Set<Entry<E, Long>>
.stream() // Stream<Entry<E, Long>>
.filter(entry -> entry.getValue() == 1)
.map(Entry::getKey)
.collect(Collectors.toList());
It requires two full iterations over the data-set. Since it uses the Stream-API, the operations support multi-threading right from the get-go though. So if you have lots of elements, this might be pretty fast due to that.
Set
Here is another approach that reduces iteration and lookup time by manually collecting into a Set
to identify duplicates as fast as possible:
Set<E> result = new HashSet<>();
Set<E> appeared = new HashSet<>();
for (E element : elements) {
if (result.contains(element)) { // 2nd occurrence
result.remove(element);
appeared.add(element);
continue;
}
if (appeared.contains(element)) { // >2nd occurrence
continue;
}
result.add(element); // 1st occurrence
}
As you see, this only requires one iteration over the elements
instead of multiple.
This approach is elegant in a sense that it does not compute unnecessary information. For what you want, it is completely irrelevant to compute how often exactly elements appear. We only care for "does it appear once or more often?" and not if it appears 5 times or 11 times.