Search code examples
javahashmapjava-streamgroupingby

Group strings into multiple groups when using stream groupingBy


A simplified example of what I am trying to do:

Suppose I have a list of strings, which need to be grouped into 4 groups according to a condition if a specific substring is contained or not. If a string contains Foo it should fall in the group FOO, if it contains Bar it should fall in the group BAR, if it contains both it should appear in both groups.

List<String> strings = List.of("Foo", "FooBar", "FooBarBaz", "XXX");

A naive approach for the above input doesn't work as expected since the string is grouped into the first matching group:

Map<String,List<String>> result1 =
strings.stream()
        .collect(Collectors.groupingBy(
                        str -> str.contains("Foo") ? "FOO" :
                                    str.contains("Bar") ? "BAR" :
                                            str.contains("Baz") ? "BAZ" : "DEFAULT"));

result1 is

{FOO=[Foo, FooBar, FooBarBaz], DEFAULT=[XXX]}

where as the desired result should be

{FOO=[Foo, FooBar, FooBarBaz], BAR=[FooBar, FooBarBaz], BAZ=[FooBarBaz], DEFAULT=[XXX]}

After searching for a while I found another approach, which comes near to my desired result, but not quite fully

Map<String,List<String>> result2 =
List.of("Foo", "Bar", "Baz", "Default").stream()
        .flatMap(str -> strings.stream().filter(s -> s.contains(str)).map(s -> new String[]{str.toUpperCase(), s}))
        .collect(Collectors.groupingBy(arr -> arr[0], Collectors.mapping(arr -> arr[1], Collectors.toList())));

System.out.println(result2);

result2 is

{BAR=[FooBar, FooBarBaz], FOO=[Foo, FooBar, FooBarBaz], BAZ=[FooBarBaz]}

while this correctly groups strings containing the substrings into the needed groups, the strings which doesn't contain the substrings and therefore should fall in the default group are ignored. The desired result is as already mentioned above (order doesn't matter)

{BAR=[FooBar, FooBarBaz], FOO=[Foo, FooBar, FooBarBaz], BAZ=[FooBarBaz], DEFAULT=[XXX]}

For now I'm using both result maps and doing an extra:

result2.put("DEFAULT", result1.get("DEFAULT"));

Can the above be done in one step? Is there a better approach better than what I have above?


Solution

  • This is ideal for using mapMulti. MapMulti takes a BiConsumer of the streamed value and a consumer. The consumer is used to simply place something back on the stream. This was added to Java since flatMaps can incur undesirable overhead.

    This works by can building a String array as you did before of Token and the containing String and collecting (also as you did before). If the key was found in the string, accept a String array with it and the containing string. Otherwise, accept a String array with the default key and the string.

    List<String> strings =
            List.of("Foo", "FooBar", "FooBarBaz", "XXX", "YYY");
    Map<String, List<String>> result = strings.stream()
            .<String[]>mapMulti((str, consumer) -> {
    
                boolean found = false;
                String temp = str.toUpperCase();
                for (String token : List.of("FOO", "BAR",
                        "BAZ")) {
                    if (temp.contains(token)) {
                        consumer.accept(
                                new String[] { token, str });
                        found = true;
                    }
                }
                if (!found) {
                    consumer.accept(
                            new String[] { "DEFAULT", str });
                }
            })
            .collect(Collectors.groupingBy(arr -> arr[0],
                    Collectors.mapping(arr -> arr[1],
                            Collectors.toList())));
    
    result.entrySet().forEach(System.out::println);
    

    prints

    BAR=[FooBar, FooBarBaz]
    FOO=[Foo, FooBar, FooBarBaz]
    BAZ=[FooBarBaz]
    DEFAULT=[XXX, YYY]
    

    Keep in mind that streams are meant to make your coding world easier. But sometimes, a regular loop using some Java 8 constructs is all that is needed. Outside of an academic exercise, I would probably do the task like so.

    Map<String,List<String>> result2 = new HashMap<>();
    
    for (String str : strings) {
         boolean added = false;
         String temp = str.toUpperCase();
         for (String token : List.of("FOO","BAR","BAZ")) {
             if(temp.contains(token)) {
                 result2.computeIfAbsent(token, v->new ArrayList<>()).add(str);
                 added = true;
             }
         }
         if (!added) {
             result2.computeIfAbsent("DEFAULT", v-> new ArrayList<>()).add(str);
         }
    }