Search code examples
javagroup-byjava-streamhavingcollectors

Java Streams GroupingBy and filtering by count (similar to SQL's HAVING)


Do Java (9+) streams support a HAVING clause similar to SQL? Use case: grouping and then dropping all groups with certain count. Is it possible to write the following SQL clause as Java stream?

GROUP BY id
HAVING COUNT(*) > 5

The closest I could come up with was:

input.stream()
        .collect(groupingBy(x -> x.id()))
        .entrySet()
        .stream()
        .filter(entry -> entry.getValue().size() > 5)
        .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

but extracting the entrySet of the grouped result to collect twice feels strange and especially the terminal collect call is basically mapping a map to itself.

I see that there are collectingAndThen and filtering collectors, but I don't know if they would solve my problem (or rather how to apply them correctly).

Is there a better (more idiomatic) version of the above, or am I stuck with collecting to an intermediate map, filtering that and then collecting to the final map?


Solution

  • The operation has to be performed after the grouping in general, as you need to fully collect a group before you can determine whether it fulfills the criteria.

    Instead of collecting a map into another, similar map, you can use removeIf to remove non-matching groups from the result map and inject this finishing operation into the collector:

    Map<KeyType, List<ElementType>> result =
        input.stream()
            .collect(collectingAndThen(groupingBy(x -> x.id(), HashMap::new, toList()),
                m -> {
                    m.values().removeIf(l -> l.size() <= 5);
                    return m;
                }));
    

    Since the groupingBy(Function) collector makes no guarantees regarding the mutability of the created map, we need to specify a supplier for a mutable map, which requires us to be explicit about the downstream collector, as there is no overloaded groupingBy for specifying only function and map supplier.

    If this is a recurring task, we can make a custom collector improving the code using it:

    public static <T,K,V> Collector<T,?,Map<K,V>> having(
                          Collector<T,?,? extends Map<K,V>> c, BiPredicate<K,V> p) {
        return collectingAndThen(c, in -> {
            Map<K,V> m = in;
            if(!(m instanceof HashMap)) m = new HashMap<>(m);
            m.entrySet().removeIf(e -> !p.test(e.getKey(), e.getValue()));
            return m;
        });
    }
    

    For higher flexibility, this collector allows an arbitrary map producing collector but since this does not enforce a map type, it will enforce a mutable map afterwards, by simply using the copy constructor. In practice, this won’t happen, as the default is to use a HashMap. It also works when the caller explicitly requests a LinkedHashMap to maintain the order. We could even support more cases by changing the line to

    if(!(m instanceof HashMap || m instanceof TreeMap
      || m instanceof EnumMap || m instanceof ConcurrentMap)) {
        m = new HashMap<>(m);
    }
    

    Unfortunately, there is no standard way to determine whether a map is mutable.

    The custom collector can now be used nicely as

    Map<KeyType, List<ElementType>> result =
        input.stream()
            .collect(having(groupingBy(x -> x.id()), (key,list) -> list.size() > 5));