java group-by java-stream having collectors

Java Streams GroupingBy and filtering by count (similar to SQL's HAVING)

Do Java (9+) streams support a HAVING clause similar to SQL? Use case: grouping and then dropping all groups with certain count. Is it possible to write the following SQL clause as Java stream?

GROUP BY id
HAVING COUNT(*) > 5

The closest I could come up with was:

input.stream()
        .collect(groupingBy(x -> x.id()))
        .entrySet()
        .stream()
        .filter(entry -> entry.getValue().size() > 5)
        .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

but extracting the entrySet of the grouped result to collect twice feels strange and especially the terminal collect call is basically mapping a map to itself.

I see that there are collectingAndThen and filtering collectors, but I don't know if they would solve my problem (or rather how to apply them correctly).

Is there a better (more idiomatic) version of the above, or am I stuck with collecting to an intermediate map, filtering that and then collecting to the final map?

Solution

The operation has to be performed after the grouping in general, as you need to fully collect a group before you can determine whether it fulfills the criteria.

Instead of collecting a map into another, similar map, you can use removeIf to remove non-matching groups from the result map and inject this finishing operation into the collector:

Map<KeyType, List<ElementType>> result =
    input.stream()
        .collect(collectingAndThen(groupingBy(x -> x.id(), HashMap::new, toList()),
            m -> {
                m.values().removeIf(l -> l.size() <= 5);
                return m;
            }));

Since the groupingBy(Function) collector makes no guarantees regarding the mutability of the created map, we need to specify a supplier for a mutable map, which requires us to be explicit about the downstream collector, as there is no overloaded groupingBy for specifying only function and map supplier.

If this is a recurring task, we can make a custom collector improving the code using it:

public static <T,K,V> Collector<T,?,Map<K,V>> having(
                      Collector<T,?,? extends Map<K,V>> c, BiPredicate<K,V> p) {
    return collectingAndThen(c, in -> {
        Map<K,V> m = in;
        if(!(m instanceof HashMap)) m = new HashMap<>(m);
        m.entrySet().removeIf(e -> !p.test(e.getKey(), e.getValue()));
        return m;
    });
}

For higher flexibility, this collector allows an arbitrary map producing collector but since this does not enforce a map type, it will enforce a mutable map afterwards, by simply using the copy constructor. In practice, this won’t happen, as the default is to use a HashMap. It also works when the caller explicitly requests a LinkedHashMap to maintain the order. We could even support more cases by changing the line to

if(!(m instanceof HashMap || m instanceof TreeMap
  || m instanceof EnumMap || m instanceof ConcurrentMap)) {
    m = new HashMap<>(m);
}

Unfortunately, there is no standard way to determine whether a map is mutable.

The custom collector can now be used nicely as

Map<KeyType, List<ElementType>> result =
    input.stream()
        .collect(having(groupingBy(x -> x.id()), (key,list) -> list.size() > 5));