Search code examples
java-8java-streamcollectors

Collect groupBy on deep property


private Map<String, Set<Square>> populateZuloSquare(List<Square> squares) {
    if (squares == null || squares.isEmpty()) {
        return emptyMap();
    }

    Map<String, Set<Square>> res = new HashMap<>();

    squares.stream()
        .filter(square -> {
            if (square.getZuloCodes().isEmpty()) {
                LOG("Ignored {}", square.id);
                return false;
            }
            return true;
        })
        .forEach(square -> {
          square.getZuloCodes()
            .forEach(code -> {
                res.putIfAbsent(code, new HashSet<>());
                res.get(code).add(square);
            }));
        });

    return Collections.unmodifiableMap(res);
}

The code above receives a list of Squares, and those squares may contain ZuloCodes inside. The output should be a immutable Map zuloCode and value all the squares with that UniquePrefix. As you can see I cannot figure out a way to remove the auxiliar collection res and make the code easily readable, is there a way to explode that collection into a [zuloCode, square] and then collect.groupBy ? Also that if inside the filter is so unreadable, how would you tackle it?


Solution

  • The standard approach is using flatMap before collecting using groupingBy, but since you need the original Square for each element, you need to map to an object holding both, the Square instance and the zulo code String.

    Since there is no standard pair or tuple type in Java (yet), a work-around is to use a Map.Entry instance, like this

    private Map<String, Set<Square>> populateZuloSquare0(List<Square> squares) {
        if (squares == null || squares.isEmpty()) {
            return emptyMap();
        }
        return squares.stream()
            .filter(square -> logMismatch(square, !square.getZuloCodes().isEmpty()))
            .flatMap(square -> square.getZuloCodes().stream()
                .map(code -> new AbstractMap.SimpleEntry<>(code, square)))
            .collect(Collectors.collectingAndThen(
                Collectors.groupingBy(Map.Entry::getKey,
                    Collectors.mapping(Map.Entry::getValue, Collectors.toSet())),
                Collections::unmodifiableMap));
    }
    private static boolean logMismatch(Square square, boolean match) {
        if(!match) LOG("Ignored {}", square.id);
        return match;
    }
    

    An alternative is to use a custom collector which will iterate over the keys:

    private Map<String, Set<Square>> populateZuloSquare(List<Square> squares) {
        if (squares == null || squares.isEmpty()) {
            return emptyMap();
        }
        return squares.stream()
            .filter(square -> logMismatch(square, !square.getZuloCodes().isEmpty()))
            .collect(Collector.of(
                HashMap<String, Set<Square>>::new,
                (m,square) -> square.getZuloCodes()
                    .forEach(code -> m.computeIfAbsent(code, x -> new HashSet<>()).add(square)),
                (m1,m2) -> {
                    if(m1.isEmpty()) return m2;
                    m2.forEach((key,set) ->
                        m1.merge(key, set, (s1,s2) -> { s1.addAll(s2); return s1; }));
                    return m1;
                },
                Collections::unmodifiableMap)
            );
    }
    

    Note that this custom collector can be seen as a parallel capable variant of the following looping code:

    private Map<String, Set<Square>> populateZuloSquare(List<Square> squares) {
        if (squares == null || squares.isEmpty()) {
            return emptyMap();
        }
        Map<String, Set<Square>> res = new HashMap<>();
        squares.forEach(square -> {
            if(square.getZuloCodes().isEmpty()) LOG("Ignored {}", square.id);
            else square.getZuloCodes().forEach(
                code -> res.computeIfAbsent(code, x -> new HashSet<>()).add(square));
        });
        return Collections.unmodifiableMap(res);
    }
    

    which might not look so bad now, when you don’t need the code to be parallel capable…