Search code examples
javacollectionsjava-8java-streammaxby

Java 8 filtering list of objects by unique name while only keeping highest ID?


Assume we have a person class with fields:

Class Person {
  private String name;
  private Integer id (this one is unique);
}

And then we have a List<Person> people such that:

['Jerry', 993]
['Tom', 3]
['Neal', 443]
['Jerry', 112]
['Shannon', 259]
['Shannon', 533]

How can I make a new List<Person> uniqueNames such that it filters for unique names only AND keeps the highest ID of that name.

So the end list would look like:

['Jerry', 993]
['Tom', 3]
['Neal', 443]
['Shannon', 533]

Solution

  • Collectors.groupingBy + Collectors.maxBy should do the trick to build the map of persons grouped by name and then selecting the max value:

    List<Person> persons = Arrays.asList(
        new Person("Jerry", 123),
        new Person("Tom", 234),
        new Person("Jerry", 456),
        new Person("Jake", 789)
    );
    
    List<Person> maxById = persons
        .stream()
        .collect(Collectors.groupingBy(
            Person::getName, 
            Collectors.maxBy(Comparator.comparingInt(Person::getID))
        ))
        .values() // Collection<Optional<Person>>
        .stream() // Stream<Optional<Person>>
        .map(opt -> opt.orElse(null))
        .collect(Collectors.toList());
    
    System.out.println(maxById);
    

    Output:

    [789: Jake, 234: Tom, 456: Jerry]
    

    Update

    is there a way to get a separate list of the Person object who were deleted because they were duplicates within this stream()?

    It may be better to collect the grouped items in a list which should be converted then in some wrapper class providing information about the maxById person and the list of deduped persons:

    class PersonList {
        private final Person max;
        private final List<Person> deduped;
        
        public PersonList(List<Person> group) {
            this.max = Collections.max(group, Comparator.comparingInt(Person::getID));
            this.deduped = new ArrayList<>(group);
            this.deduped.removeIf(p -> p.getID() == max.getID());
        }
        
        @Override
        public String toString() {
            return "{max: " + max + "; deduped: " + deduped + "}";
        }
    }
    

    Then the persons should be collected like this:

    List<PersonList> maxByIdDetails = new ArrayList<>(persons
        .stream()
        .collect(Collectors.groupingBy(
            Person::getName, 
            LinkedHashMap::new,
            Collectors.collectingAndThen(
                Collectors.toList(), PersonList::new
            )
        ))
        .values()); // Collection<PersonList>
    
    maxByIdDetails.forEach(System.out::println);
    

    Output:

    {max: 456: Jerry; deduped: [123: Jerry]}
    {max: 234: Tom; deduped: []}
    {max: 789: Jake; deduped: []}
    

    Update 2

    Getting list of duplicated persons:

    List<Person> duplicates = persons
        .stream()
        .collect(Collectors.groupingBy(Person::getName))
        .values() // Collection<List<Person>>
        .stream() // Stream<List<Person>>
        .map(MyClass::removeMax)
        .flatMap(List::stream) // Stream<Person>
        .collect(Collectors.toList()); // List<Person>
    
    System.out.println(duplicates);
    

    Output:

    [123: Jerry]
    

    where removeMax may be implemented like this:

    private static List<Person> removeMax(List<Person> group) {
        List<Person> dupes = new ArrayList<>();
        Person max = null;
    
        for (Person p : group) {
            Person duped = null;
            if (null == max) {
                max = p;
            } else if (p.getID() > max.getID()) {
                duped = max;
                max = p;
            } else {
                duped = p;
            }
            if (null != duped) {
                dupes.add(duped);
            }
        }
        return dupes;
    }
    

    Or, providing that hashCode and equals are implemented properly in class Person, the difference between the two lists may be calculated using removeAll:

    List<Person> duplicates2 = new ArrayList<>(persons);
    duplicates2.removeAll(maxById);
    System.out.println(duplicates2);