I have an employee class with the following fields.
class Employee {
final int id;
final String name;
final LocalDate updatedDate;
// setters and getters
}
I have a list of employee, the list could contain duplicate employees with different updatedDate
. Now I want to create a set in such a way that it has a unique entry for each employeeId
. if there are duplicate entries, the one which has the latest updatedDate
should be kept.
I came up with the below solution, sort based on updatedDate
and add it TreeSet
which maintains uniqueness on Id
. I could use HashSet
by implementing hashcode & equals in Employee.
List<Employee> employees = new ArrayList<>();
// sort
List<Employee> sortedList = employees.stream()
.sorted(Collections.reverseOrder(Comparator.comparing(employee -> employee.updatedDate)))
.collect(Collectors.toList());
Set<Employee> employeeSet = new TreeSet<>(Comparator.comparing( employee -> employee.id));
sortedList.forEach(employeeSet::add);
Now the problem, most of the time employee list will hold unique elements. very rarely there will be duplicate entries. Sorting when only unique entries are present will not scale well. is there a way to improve upon the above solution by avoiding sorting?
Another way for this question is by using groupingBy
collector and then use collectingAndThen
collector to find the latest updatedDate
.
I think this way is more readable and clean.
for simplifying I imported collectors as static.
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.maxBy;
import static java.util.stream.Collectors.groupingBy;
Collection<Employee> collection = employees.stream()
.collect(groupingBy(Employee::getId,
collectingAndThen(maxBy(Comparator.comparing(Employee::getUpdatedDate)),
employee -> employee.orElse(null)))).values();
and then
List<Employee> result = new ArrayList<>(collection);