Search code examples
javajava-streamcollectors

How to fix Duplicate Key IllegalStateException while using Collectors.toMap()


I have a stream that processes some strings and collects them in a map.

But getting the following exception:

java.lang.IllegalStateException:
Duplicate key test@yahoo.com
(attempted merging values [test@yahoo.com] and [test@yahoo.com])
at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)

I'm using the following code:

Map<String, List<String>> map = emails.stream()
    .collect(Collectors.toMap(
        Function.identity(),
        email -> processEmails(email)
    ));

Solution

  • The flavor of toMap() you're using in your code (which expects only keyMapper and valueMapper) disallow duplicates merely because it's not capable to handle them. And exception message explicitly tells you that.

    Judging by the resulting type Map<String, List<String>> and by the exception message which shows strings enclosed in square brackets, it is possible to make the conclusion that processEmails(email) produces a List<String> (although it's not obvious from your description and IMO worth specifying).

    There are multiple ways to solve this problem, you can either:

    Map<String, List<String>> map = emails.stream()
        .collect(Collectors.toMap(
            Function.identity(),
            email -> processEmails(email),
            (list1, list2) -> list1 // or { list1.addAll(list2); return list1} depending on the your logic of resolving duplicates you need
        ));
    
    • Make use of the collector groupingBy(classifier,downstream) to preserve all the emails retrieved by processEmails() that are associated with the same key by storing them into a List. As a downstream collector we could utilize a combination of collectors flatMapping() and toList().
    Map<String, List<String>> map = emails.stream()
                .collect(Collectors.groupingBy(
                    Function.identity(),
                    Collectors.flatMapping(email -> processEmails(email).stream(),
                        Collectors.toList())
                ));
    

    Note that the later option would make sense only if processEmails() somehow generates different results for the same key, otherwise you would end up with a list of repeated values which doesn't seem to be useful.

    But what you definitely shouldn't do in this case is to use distinct(). It'll unnecessarily increase the memory consumption because it eliminates the duplicates by maintaining a LinkedHashSet under the hood. It would be wasteful because you're already using Map which is capable to deal with duplicated keys.