Search code examples
javaarrayscollectionsaggregate

Collection GroupBy Aggregates without Stream API


I have a simple ArrayList with people's name and their ages (Person object).

class Person {
    String name;
    int age;
}

List<Person> personList = new ArrayList<>();
personList.add(new Person("john", 47));
personList.add(new Person("john", 35));
personList.add(new Person("john", 12));
personList.add(new Person("britney", 27));
personList.add(new Person("britney", 16));

The goal is to produce ArrayList with people, who has max age for every name, but without using Stream API. Similar to SQL's SELECT name, max(AGE) FROM P GROUP BY name:

[Person("john", 47), Person("britney", 27)]

With Stream API I did it that way:

personList.stream()
        .collect(Collectors
            .groupingBy(Person::getName,
                    Collectors.maxBy(Comparator.comparing(Person::getAge))))
            .values().stream()
            .map(s -> s.orElseGet(Person::new))
            .collect(Collectors.toList());

But I doubt, whether I did it optimally without Stream API (the code is below).

The question: Is the code below an optimal way? If not, please describe yours with mentioning O-complexity.

Second question: Which O-complexity the solution with Stream API above has? Surprisingly, using functional approach I can't grasp it.

Map<String, Integer> personAgg = new HashMap<>();
personList.forEach(personAdd -> {
    var currentKey=personAdd.getName();
    var currentValue=personAdd.getAge();
    if (personAgg.containsKey(currentKey)) {
        if (personAgg.get(currentKey) < currentValue)
            personAgg.put(currentKey, currentValue);
    } else {
        personAgg.put(currentKey, currentValue);
    }
});

List<Person> personAggArray = new ArrayList<>();
personAgg.forEach((k,v) -> personAggArray.add(new Person(k,v)));

Solution

  • The time complexity of doing this has to be at least O(n), where n is the number of elements in personList. To find the max age of each person, you have to check every person.

    There is very little to no time complexity guarantees on stream operations, so here I will assume the OpenJDK 17 implementation which I have on my machine. That said, it's hard to imagine anyone would write a non-optimal implementation.

    Your solution using streams is O(n). Each element is processed first by the groupingBy collector to be put in one of the groups, then gets consumed by the maxBy collector, which operates on each group. The things that both collectors do are done in constant time. groupingBy just checks if it matches any existing group (this is done by a HashMap lookup), and maxBy compares it with the current max using the comparator provided.

    Your solution not using streams is also O(n) - you do a bunch of constant time things (reading and writing to personAgg) for each element in personList. So it is optimal as far as time complexity is concerned.

    Note that you could also rewrite the loop body to just one call to compute:

    for (var person: personList) {
        personAgg.compute(
            person.getMame(),
            (k, v) -> Math.max(
                Objects.requireNonNullElse(v, Integer.MIN_VALUE),
                person.getAge()
            )
        );
    }
    

    Of course, this only avoids accessing the map multiple times, and change the time complexity.