I have a Person table in database and the associated domain class looks like below:
public class Person {
private String firstName;
private String secondName;
private String city;
private String country;
private int age;
private int hits;
// accessors
}
Database table will have multiple rows for same person and I need to aggregate the hits for each using java 8.
A person is uniquely identified by following fields: firstName, secondName, city, country
I have written the below logic and it gives me the expected result
public void generateReport(List<PersonDTO> persons) {
Map<String, Integer> personHitCount =
persons
.stream()
.collect(Collectors.groupingBy(l -> getPersonKey(l), Collectors.summingInt(Person::getHits)));
List<Person> reportRecords =
persons
.stream().collect(Collectors.toMap(l -> getPersonKey(l), Function.identity(), (o1, o2) -> o1))
.entrySet()
.stream()
.filter(e -> personHitCount.containsKey(e.getKey()))
.map(e -> transform(e.getValue(), personHitCount.get(e.getKey())))
.collect(Collectors.toList());
reportRecords.stream().forEach(System.out::println);
}
private Person transform(Person personDTO, int count) {
return new Person(
personDTO.getFirstName(),
personDTO.getSecondName(),
personDTO.getCity(),
personDTO.getCountry(),
personDTO.getAge(),
count);
}
private String getPersonKey(Person person) {
return new StringJoiner("-")
.add(person.getFirstName())
.add(person.getSecondName())
.add(person.getCity())
.add(person.getCountry())
.toString();
}
I am not sure, if this is a good and performant way of doing it, as I am looping twice over persons list. Please can you suggest any improvements to this code or a better way of doing it.
You could improve your stream by removing some unnecessary operations, like converting a DTO into a String
and filtering (you're filtering for the keys coming from the very map you're streaming...). That would give you a more compact and readable stream.
As @daniu suggested in the comment section, a better approach would be to get the total number of hits from SQL, as the language provides aggregate functions for this purpose. However, I assume that perhaps you didn't go this way because your SQL query would look something like this SELECT firstName, secondName, city, country, SUM(hits) FROM ...
leaving behind other fields that are not necessary for the group by but that you need to construct a Person
instance (this is just a guess).
In this solution, I assume that your PersonDTO
class honors the equals and hashcode contract by providing an implementation based on the attributes: firstName
, secondName
, city
and country
.
public void generateReport2(List<PersonDTO> personsDTO) {
List<Person> reportRecords = personsDTO.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.summingInt(PersonDTO::getHits))) //summing the total hits by DTO, where two DTOs are said equals by firstName, secondName, city, and country
.entrySet().stream() //streaming the map's entries
.map(e -> transform(e.getKey(), e.getValue())) //mapping each entry to a Person with the DTO (key) and the total hits (value)
.collect(Collectors.toList()); //collecting the Person instance
reportRecords.stream().forEach(System.out::println);
}
@Data
@NoArgsConstructor
@AllArgsConstructor
@EqualsAndHashCode(onlyExplicitlyIncluded = true)
public class PersonDTO {
@EqualsAndHashCode.Include
private String firstName;
@EqualsAndHashCode.Include
private String secondName;
@EqualsAndHashCode.Include
private String city;
@EqualsAndHashCode.Include
private String country;
private int age;
private int hits;
}
I'm updating my answer to provide a second solution with a more substantial improvement where the elements are iterated only once. However, it requires changing the method's return type from List<Person>
to Collection<Person>
(I know that the method was void, but I assume it was printing the results instead of returning just for demonstration’s sake).
Changing the return type from List<Person>
to Collection<Person>
would imply generalizing to a higher interface in the class hierarchy, as Collection
is a parent of List
, which might not be doable if you've already agreed on a specific contract/interface with other developers. This is why I'm posting this second solution as an update and not as the actual answer.
In short, the total numbers of hits are computed using Person
instances, instead of Integers. Basically, each DTO is used as a key, and whenever two DTOs collide (they're equal) the hits of their corresponding Person
are combined, keeping only the resulting Person
instance.
public static void generateReport3(List<PersonDTO> personsDTO) {
Collection<Person> reportRecords = personsDTO.stream() // Mapping each DTO to a Person and using the DTO as the key
.collect(Collectors.toMap(Function.identity(), dto -> transform(dto, dto.getHits()), (p1, p2) -> {
p1.setHits(p1.getHits() + p2.getHits()); // Whenever two DTOs collide (they're equal), they combine the hits of their corresponding Person
return p1; //keeping the combined Person instance
}))
.values(); //Retrieving the Collection of Person
reportRecords.stream().forEach(System.out::println);
}
Here is a demo version at OneCompiler. In the demo, I've assumed that both Person
and PersonDTO
possess the same set of attributes, since there is no implementation of the DTO class attached. I've also updated the demo with the second solution.