Search code examples
javajava-8functional-programmingjava-stream

Grouping by multiple fields and counting using in Java 8


I have a Person table in database and the associated domain class looks like below:

public class Person {   
    
    private String firstName;
    private String secondName;
    private String city;
    private String country;
    private int age;
    private int hits;
    
    // accessors
} 

Database table will have multiple rows for same person and I need to aggregate the hits for each using java 8.

A person is uniquely identified by following fields: firstName, secondName, city, country

I have written the below logic and it gives me the expected result

public void generateReport(List<PersonDTO> persons) {
        Map<String, Integer> personHitCount =
                persons
                        .stream()
                        .collect(Collectors.groupingBy(l -> getPersonKey(l), Collectors.summingInt(Person::getHits)));

        List<Person> reportRecords =
                persons
                        .stream().collect(Collectors.toMap(l -> getPersonKey(l), Function.identity(), (o1, o2) -> o1))
                        .entrySet()
                        .stream()
                        .filter(e -> personHitCount.containsKey(e.getKey()))
                        .map(e -> transform(e.getValue(), personHitCount.get(e.getKey())))
                        .collect(Collectors.toList());

        reportRecords.stream().forEach(System.out::println);
  }

private Person transform(Person personDTO, int count) {
    return new Person(
            personDTO.getFirstName(),
            personDTO.getSecondName(),
            personDTO.getCity(),
            personDTO.getCountry(),
            personDTO.getAge(),
            count);
}

private String getPersonKey(Person person) {
    return new StringJoiner("-")
            .add(person.getFirstName())
            .add(person.getSecondName())
            .add(person.getCity())
            .add(person.getCountry())
            .toString();
}

I am not sure, if this is a good and performant way of doing it, as I am looping twice over persons list. Please can you suggest any improvements to this code or a better way of doing it.


Solution

  • Solution with Double Iteration and Same Return Type

    You could improve your stream by removing some unnecessary operations, like converting a DTO into a String and filtering (you're filtering for the keys coming from the very map you're streaming...). That would give you a more compact and readable stream.

    As @daniu suggested in the comment section, a better approach would be to get the total number of hits from SQL, as the language provides aggregate functions for this purpose. However, I assume that perhaps you didn't go this way because your SQL query would look something like this SELECT firstName, secondName, city, country, SUM(hits) FROM ... leaving behind other fields that are not necessary for the group by but that you need to construct a Person instance (this is just a guess).

    In this solution, I assume that your PersonDTO class honors the equals and hashcode contract by providing an implementation based on the attributes: firstName, secondName, city and country.

    public void generateReport2(List<PersonDTO> personsDTO) {
            List<Person> reportRecords = personsDTO.stream()
                    .collect(Collectors.groupingBy(Function.identity(), Collectors.summingInt(PersonDTO::getHits))) //summing the total hits by DTO, where two DTOs are said equals by firstName, secondName, city, and country
                    .entrySet().stream() //streaming the map's entries
                    .map(e -> transform(e.getKey(), e.getValue())) //mapping each entry to a Person with the DTO (key) and the total hits (value)
                    .collect(Collectors.toList()); //collecting the Person instance
    
            reportRecords.stream().forEach(System.out::println);
    }
    
    @Data
    @NoArgsConstructor
    @AllArgsConstructor
    @EqualsAndHashCode(onlyExplicitlyIncluded = true)
    public class PersonDTO {
    
            @EqualsAndHashCode.Include
            private String firstName;
            @EqualsAndHashCode.Include
            private String secondName;
            @EqualsAndHashCode.Include
            private String city;
            @EqualsAndHashCode.Include
            private String country;
            private int age;
            private int hits;
    }
    

    Solution with Single Iteration and Different Return Type

    I'm updating my answer to provide a second solution with a more substantial improvement where the elements are iterated only once. However, it requires changing the method's return type from List<Person> to Collection<Person> (I know that the method was void, but I assume it was printing the results instead of returning just for demonstration’s sake).

    Changing the return type from List<Person> to Collection<Person> would imply generalizing to a higher interface in the class hierarchy, as Collection is a parent of List, which might not be doable if you've already agreed on a specific contract/interface with other developers. This is why I'm posting this second solution as an update and not as the actual answer.

    In short, the total numbers of hits are computed using Person instances, instead of Integers. Basically, each DTO is used as a key, and whenever two DTOs collide (they're equal) the hits of their corresponding Person are combined, keeping only the resulting Person instance.

    public static void generateReport3(List<PersonDTO> personsDTO) {
        Collection<Person> reportRecords = personsDTO.stream() // Mapping each DTO to a Person and using the DTO as the key
                .collect(Collectors.toMap(Function.identity(), dto -> transform(dto, dto.getHits()), (p1, p2) -> { 
                    p1.setHits(p1.getHits() + p2.getHits()); // Whenever two DTOs collide (they're equal), they combine the hits of their corresponding Person
                    return p1;  //keeping the combined Person instance
                }))
                .values();  //Retrieving the Collection of Person
    
        reportRecords.stream().forEach(System.out::println);
    }
    

    Demo Version

    Here is a demo version at OneCompiler. In the demo, I've assumed that both Person and PersonDTO possess the same set of attributes, since there is no implementation of the DTO class attached. I've also updated the demo with the second solution.