Search code examples
javaalgorithmjava-8distinct-values

Remove duplicate entries from a list and keep last updated entry LocalDate


I have an employee class with the following fields.

class Employee {
    final int id;
    final String name;
    final LocalDate updatedDate;
    // setters and getters
}

I have a list of employee, the list could contain duplicate employees with different updatedDate. Now I want to create a set in such a way that it has a unique entry for each employeeId. if there are duplicate entries, the one which has the latest updatedDate should be kept.

I came up with the below solution, sort based on updatedDate and add it TreeSet which maintains uniqueness on Id. I could use HashSet by implementing hashcode & equals in Employee.

List<Employee> employees = new ArrayList<>();

// sort
List<Employee> sortedList = employees.stream()
       .sorted(Collections.reverseOrder(Comparator.comparing(employee -> employee.updatedDate)))
       .collect(Collectors.toList());

Set<Employee> employeeSet = new TreeSet<>(Comparator.comparing( employee -> employee.id));
sortedList.forEach(employeeSet::add);

Now the problem, most of the time employee list will hold unique elements. very rarely there will be duplicate entries. Sorting when only unique entries are present will not scale well. is there a way to improve upon the above solution by avoiding sorting?


Solution

  • Another way for this question is by using groupingBy collector and then use collectingAndThen collector to find the latest updatedDate. I think this way is more readable and clean.

    for simplifying I imported collectors as static.

    import static java.util.stream.Collectors.collectingAndThen;
    import static java.util.stream.Collectors.maxBy;
    import static java.util.stream.Collectors.groupingBy;
    
    Collection<Employee> collection = employees.stream()
             .collect(groupingBy(Employee::getId,
                 collectingAndThen(maxBy(Comparator.comparing(Employee::getUpdatedDate)),
                                employee -> employee.orElse(null)))).values();
    

    and then

    List<Employee> result = new ArrayList<>(collection);